The Business of Data is to Serve the People

Share |


At yesterday’s FTC hearing on the business of big data I outlined some of the important uses of big data and analytics. SIIA companies are industry leaders using analytics and big data to improve business methods and processes.  Among their innovative uses of data are to:  

  • produce fairer and more accurate credit scoring models
  • increase the effectiveness and speed of student learning and identify students in need of additional attention and resources
  • enable increasingly effective personalization of online ads for large array of digital advertising players
  • improve business risk analysis, business due diligence and regulatory compliance.

Big data analytics including machine learning and AI evolved from older data analytics methodologies. They involve new processing techniques for analyzing data of increased variety, velocity, and volume.  

This is a crucial development that allows analysts to detect patterns in data without first having to develop and test hypotheses. 

While the results are sometimes startling in their effectiveness, there is nothing brand new about these techniques, and they raise few new regulatory concerns.

Improved Credit Scoring Models

Credit scoring models have been used for decades to increase the accuracy and efficiency of credit granting. They help as many people as possible receive offers of credit on terms they can afford; and they allow lenders to efficiently manage credit risk.

The traditional credit scores improved greatly on the subjective assessments by loan officers. They were built from information in credit bureau reports and typically use variables relating to credit history.

But these traditional credit scores have well-known limitations.  They are not able to score approximately 70 million individual who lack credit reports or have “thin” credit reports without enough data to generate a credit score.

This adversely affects historically disadvantaged minorities.  A recent Lexis-Nexis study found that 41% of Hispanics and African-Americans could not be scored using traditional methods.

To remedy this limitation, an alternative credit score, called RiskView, built by Lexis-Nexis relies on data such as educational history, home ownership, and court-records

Fully 81% of the unscorable minorities received a RiskView score, a result that is more accurate and fairer.

This example shows that even just expanding the range of data in use can achieve measureable improvements in outcomes.  Machine learning credit models are in development and promise to be even more effective.

Personalized Learning

Research has shown that many students who eventually drop out of high school can be identified as early as the sixth grade by their attendance, behavior, and course performance. Even more can be identified by the middle of ninth grade.

Early warning indicators based on attendance records, behavior problems, and course performance can measure this dropout risk. 

This knowledge allows schools to give these students at risk the meaningful support and interventions they need as early as possible.  This increases the number of students that graduate ready for success in either college or career.  Using these data techniques, in one school in 2013, one-third of students flagged for missing school got back on track to graduation.

Personalized learning technology speeds the learning process. In developmental math courses used in community colleges, a program called Aleks from McGraw-Hill uses AI to analyze the progress of students and adapt learning to their needs.  The schools report that this new technology gets students through the remedial material much more quickly.

Improved Personalization for Online Ads

The use of big data and advanced analytics can improve the effectiveness of advertising in through better website analytics and through improved analysis of large customer data bases.

The movement of visitors on a website is usually recorded, containing data such as which pages they browsed and how much time they spend on each page.  Critical patterns of website interaction that cannot be detected by human inspection of the data can be inferred through machine learning programs.  Once these patterns are discovered website visitors can be segmented into different groups based on their inferred preferences, and the website’s content can be personalized to the predicted tastes and preferences of increasingly narrow marketing segments.

For marketers this creates far more efficient targeting of adverting and for website users an experience more tailored their own interests and needs.

Companies often have large amounts of their own customer data or data obtained from third-party providers and need an effective tool to detect patterns in the data that can be used as the basis of marketing campaigns.

Machine learning programs can dig through this data to find insights that can be used to devise smarter and more effective digital ads. The programs can also be used to advise marketers which type of campaign to use - email, social media engagement, online advertising, website recommendation.

In addition, the use of inferred psychological characteristics is often a good mechanism for improving the effectiveness of advertising.  For instance, the level of extraversion or openness-to-experience can be inferred from social media behavior.  Matching the content of advertising to this characteristic can improve responses including clicks and purchases, estimated in one study as an increase of “40% more clicks and up to 50% more purchases.”

The benefits of increasingly effective targeted ads include greater relevance for consumers, and additional revenue to provide for free or subsidized online content.

Improved Business Risk Management Services

Information services companies can help their business and organizational customers manage their business risk exposure through the use of data and data analysis.  

These data sets rely on public records information and information about people in their business capacity.  And they usually include non-personal information such as financial and operational information about companies including payment histories.

The predictive analytics includes likelihood of the repayment of a business loan or profitability analysis to assist in merger analysis.

These techniques help companies make better decisions and manage risks like identity theft, fraud, money laundering, and terrorism.  They help prevent financial crimes, and insurance and government benefit scams.  And they can help law enforcement solve crimes.

Regulators want financial institutions to detect terrorist financing and money laundering using whatever techniques seem to be most effective.  The coming thing in this area is that the same machine learning approaches that can spot a pattern of bad transactions in the credit card world can be used to assess the risk that a potential customer would engage in these suspicious transactions. 

These examples are just of a few of the many enhancements to business and organizational processes made possible by the new techniques of data analysis. 

They build on the previous efforts to use data and older analytic techniques and do not create any fundamentally new policy problems. For instance, credit score must conform to the fair lending laws.  A new machine learning credit score that discriminates against a protected class does not escape liability by claiming it was using a brand-new AI technique.  Discrimination is still against the fair lending laws even if done with an AI or machine learning algorithm.

Some additional relevant policy points are these:

  • New privacy rules should provide for substantive consumer protection against the harms associated with data use and analysis, while encouraging the important and socially beneficial uses of data and analysis.
  • Assessing data monopolies must be done on a case-by-case basis.  The amount of data involved isn’t in itself a data monopoly problem.
  • A data monopoly problem might arise if a company merger creates a scarcity of data available to others to engage in a particular business, but valuable data is often available from other sources.
  • Data is not a barrier to companies with a new idea or better services. One firm’s use of data does not diminish its availability to others. If I give my name and email address to one firm, I can give it to another. 

Mark Mark MacCarthy, Senior Vice President, Public Policy at SIIA, directs SIIA’s public policy initiatives in the areas of intellectual property enforcement, information privacy, cybersecurity, cloud computing and the promotion of educational technology. Follow Mark on Twitter at @Mark_MacCarthy.