Big Data is Not So Scary

2013-02-bigdataEditor’s note: This post originally appeared here.

I’ve been reading Nate Silver’s The Signal and the Noise. It’s not the sort of book I normally would read, but since Nate kept me from a jumping off a tall building during the last election I felt I owed him the $27.95. Given Nate’s record predicting election outcomes, you might think this is a book that reveals the hidden secrets of the black art of predicting things. But it’s not. It’s about how hard it is to make accurate predictions even when we have mountains of data from which to do it. And it’s causing me to look differently at the issue of Big Data and predictive analytics.

Nate spends a lot of pages on some of those things for which we have lots of data but still aren’t good at predicting—the weather, earthquakes, economic growth, etc. Consider economic growth. We all have a sense of just how much economic data there is and how long the time series. (Nate estimates around 4 million variables.) But forecasts of growth are all over the map and even “consensus forecasts” routinely are just plain wrong.

Nate argues that predictions fail because we fall victim to two common errors. The first is to overfit the prediction model into something that looks very sophisticated and plausible but either ignores important variables or simply fails to understand the underlying structure of the data. Machine learning is especially susceptible to overfitting. The second is the classic error of interpreting correlation as causation. A good example is the Super Bowl indicator, which says that the direction of the stock market can be predicted based on who wins the Super Bowl.

Ultimately, we need to be able to make good decisions about which data are important.  And we need to be able to look at what a model is saying, why it’s saying it, and judge whether it makes sense. Finally, we need to understand the uncertainty in the prediction and communicate it. That sounds a lot like MR, except for that last part about uncertainty.

Right now the possibility of a future world of petabytes, MPP architectures, neural networks and naïve Bayes is scaring the pants off a lot of people in the MR industry. It may well be very bad news for MR companies but maybe not so bad for the MR profession.  There always will be demand for people who understand data, consumers and the competitive challenges that client companies face in the marketplace.

Or, as Nate writes, “Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise.”

This entry was posted in Research Trends and tagged , , , by Reg Baker, Ph.D.. Bookmark the permalink.
Reg Baker, Ph.D.

About Reg Baker, Ph.D.

Reg Baker is the former president and chief operating officer of Market Strategies International. He continues to serve the company as a consultant on research methods and technologies. Reg is active in numerous professional associations and industry bodies including AAPOR, CASRO, ESOMAR and the Technical Committee responsible for ISO 20252—Market Opinion and Social Research. He serves on the Executive Editorial Board of the International Journal of Market Research and is a member of the ESOMAR Professional Standards Committee. Throughout his career, Reg has focused on the methodological and operational implications of new survey technologies including CATI, CAPI, Web and now mobile. He writes and presents on these and related issues to diverse national and international audiences and blogs off and on as thesurveygeek. Prior to joining Market Strategies in 1995, he was vice president of research services for NORC at the University of Chicago.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>