Pew Research discusses Google Consumer Surveys' fit for purpose

Scott Keeter of Pew Research Center discussed Pew’s experiments with Google Consumer Surveys at the AAPOR annual conference in Boston today. While Pew Research remains committed to rigorous, probabily-based sampling for all major work, Scott took pains to point out, the organization wants to look at non-probability sampling for particular purposes. Working with Google was a partnership rather than a blind test because Pew wanted to understand the methodology “under the hood”.

Google Consumer Surveys (GCS) are sampled from online publisher websites who have a “survey wall” for access to content instead of a paywall. GCS uses quota sampling and then weighting on inferred demographics.

Pew is not ready to make a bottom-line judgement on GCS but discussed “fit for purpose”, with different purposes including: national point estimates, tracking change over time, quick reaction measurement, pretesting question wording, open-end testing, diverse question formats and associations between variables. If GCS is biased in a systematic way, can you use it for trending? GCS is weakest at associations between variables, given the limit of two-question surveys.

Looking at 52 comparisons of RDD samples, the point differences average 6.5 points and median differences are 3.5. Some of these are mode differences, some are population differences. GCS sample has an older, more highly educated population than RDD sample. Correlations by age for inferred demographics line up with RDD estimates, even though there is slippage in the inferred demographics. Individuals might not be who Google thinks they are, but overall the demographic breakdowns worked for the questions tested.

Pew has asked people about volunteering and GCS underreports this considerably more than RDD but the GCS is closer to the Current Population Benchmark than RDD, perhaps due to social desirability bias. GCS sample is less religious, which is a demonstrated Internet mode effect.

Some observations:

In the past month, Pew ran three surveys which provided reliability over time with stability of estimates.
For quick reaction surveys, the first presidential debate didn’t line up as well as the second debate did with RDD sample.
Pew tested an open-ended question on Google for fielding as an open-ended question on the phone but providing interviewers coding categories identified from the most common responses on GCS.

Google Consumer Surveys produces results quickly, cheaply and timlely for specific events. It allows for the use of multiple question types. Unfortunately, because of the reliance on nonnprobbility sampling it is difficult to predict when it works well and when it doesn’t. Google Consumer Surveys is a work in progress.

Pew Research plans to continue to use GCS for quick reaction polls; for testing of survey questions including wording, order and format; as well as testing open ended questions to help inform development of closed-end questions.

Pew is interested in exploring how well it can measure media use at various times of day and hopes to explore types of nonprobability methods to see how they might supplement traditional probability-based surveys, even though they won’t be using GCS for national point estimates.

In the question and answer session, Pew was asked if national point estimates were off in any specific way. Scott said that some were off by a little and some off by a lot and those that did worse tended to suffer from mode effects or vague questions. For instance, a question about looking online for health information in the past several months showed an enormous disparity, but posing the question differently produced closer estimates. Pew was unable to come up with a good theory for why certain questions had big differences.

Another question asked, “Have you shown here that we can apply weighting and modeling to non-probability panel surveys and get somewhat similar estimates to probability surveys? Is that a sign that we are good modelers even though we don’t know anything about validity?”

Scott answered that the GCS modeling is very light and optional: in the interface you can turn the weights off, and if you do it makes a point or two difference. GCS does use quota-based sampling for building the sample and that can’t be changed.

A panelist said, “I came away from this experience with non probabiliy modeling realizing that if you are a sampling from a relatively large and heterogenous frame of people visiting websites then your need for modeling may be greatly reduced, but if you have an opt-in panel you have to do more modeling.”

Author Notes:

Jeffrey Henning

Jeffrey Henning, IPC is a professionally certified researcher and has personally conducted over 1,400 survey research projects. Jeffrey is a member of the Insights Association and the American Association of Public Opinion Researchers. In 2012, he was the inaugural winner of the MRA’s Impact award, which “recognizes an industry professional, team or organization that has demonstrated tremendous vision, leadership, and innovation, within the past year, that has led to advances in the marketing research profession.” In 2022, the Insights Association named him an IPC Laureate. Before founding Researchscape in 2012, Jeffrey co-founded Perseus Development Corporation in 1993, which introduced the first web-survey software, and Vovici in 2006, which pioneered the enterprise-feedback management category. A 35-year veteran of the research industry, he began his career as an industry analyst for an Inc. 500 research firm.