Annie Pettit, Ph.D. of Peanut Labs and Marc Alley, PRC of 1st Global had a point/counterpoint debate about the role of Big Data in market research. Annie is an industry-leading thought leader; Marc Alley is researcher who has transitioned into a role of data science. [All verbatim quotes are approximations from my live-blogging.]
Marc Alley said Big Data is not Big Data until it is at least a tenth of a petabyte, a stack of 10 piles of CD-ROMs, each reaching to the moon. This is not the type of data that researchers are used to using. The data is fast moving and has to be analyzed in real time. To add to the complexity, data is unstructured.
Annie countered that Big Data has been around a long time. Big Data is often simply Bigger Data than we are used to analyzing.
“Why did I purchase this Coffee Crisp?” Annie asked. “Big Data is not going to tell you. Did I buy it because I like it? Because it was the only chocolate bar available?”
Marc said that we think we can predict causality, but that causes a lot of problems. Statistics is about the richest analysis of the least amount of sample, because sample is expensive.
Google predicted the flu virus before the CDC and ran 450 million mathematical models, Marc said. Google says it doesn’t have better algorithms than anyone else, just access to more data. Annie pointed out that the algorithm no longer works.
Marc argued that Big Data is comprehensive, but Annie pointed out that Big Data is not comprehensive. It offers limited competitive data and sometimes lack of access to all internal data. She talked about Big Data sets with missing records; for instance, mobile records where the smartphone had geolocation turned off or not shared for privacy.
Marc pointed out the typical receipt survey is rarely taken and offers an inaccurate view of what was purchased. Annie countered that you wouldn’t use the receipt survey to ask what people had purchased, since you already have that information.
Marc said that analyzing Big Data is about machine learning with systems like Hadoop. Annie countered you can analyze Big Data with SAS, R, and other tools that researchers are using.
Marc said that he is not advocating throwing out traditional methods of developing customer insights. But too often it assumes belief in self-reported information, where Big Data adds behavioral information and points out that consumers lie. Annie countered that consumers are human and inaccurate, not liars.
Wendy Cobrda of Gannett asked how the two disciplines can get along. Annie said that a data scientist is just playing with data – a waste of time. A market researcher, on the other hands, puts anthropology, psychology and traditional understanding of consumers to work on large datasets. Marc said that so much data is collected that the added investment in analyzing it is easily justifiable.
An audience member pointed out the problems with bots polluting social media streams and wondered how to differentiate human from robotic inputs. Marc said that robots were creating more stories, for small and large publishers; fallible humans often make mistakes themselves in writing stories.
Annie discussed an algorithm being used to determine if you would be accepted to a university or an algorithm used to approve your first mortgage. Marc pointed out that humans are in charge, even if they delegate to algorithms. Market researchers need to be involved with Big Data, he said; managers need researchers’ insights into the business. “We should be excited about this; not fearing it. There used to be a law in New York City that if you were driving and came upon a horse, you had to pull over and cover the car with a tarp so as not to frighten the horse. Annie wants to cover up Big Data.”
Another audience member took issue with using the size of Big Data to define it, arguing that megabytes of data were still Big Data and that large datasets could produce spurious correlations. Annie pointed out the need to bring to the interpretation of Big Data the traditional understanding of statistics, random error, and non-random error. Marc argued that more data doesn’t mean more error and reiterated the need for large datasets.
Marc said researchers should look for untapped information, available from external partners or from internal systems such as “customer data trapped underwater in an iceberg.” Annie argued that merging customer data with credit information and healthcare information to improve targeting raises ethical and privacy concerns.
Marc said it is not nefarious; “it is capitalism, selling baubles and bits.” Annie argued that just because you can use some of these datasets doesn’t mean that you should.
Are there tools to combine survey data, focus group data, and internal data to discover insights? Annie said there is no way to take all kinds of data, consolidate it, and deliver value. Human analysis is essential: “you can’t just push a button.”
How do you corral all this data to discover things in aggregate? Make sure you have a research objective – don’t assume the data is going to tell you things, Annie said. Don’t get lost in the data. “If you have a big pile of data, leave it there until you have a problem you think it can solve.”
Kathryn Korostoff of Research Rockstar said real world practical case studies “don’t use 10 stacks of data to the moon but answer simple questions, such as to understand purchase behavior or clickstream behavior and its impact. Are those not Big Data?”
Marc said it was Big Data if the dataset was large enough, a tenth of a petabyte, or else it wasn’t Big Data. “When you say unstructured, messy, huge amounts, clickstream data is one of them.”
Annie argued that we were concentrating too much on the Big in Big Data. “Forget the fear of Big Data. We know what we need to know. We’ve been doing it all along, with panel records in the millions. It is still market research. It is melded together. I don’t understand the fear. Push the fear aside and analyze.”
Annie won everything all the time. [She made me type that.]