Many of our clients have integrated generative AI into their workflow. They often create prompts to generate new materials that integrate our survey findings with their marketing materials (public as well as private), their other proprietary research, and their licensed secondary research.

As a result, we’re now often asked to QA the resulting output. Last week we reviewed three of these documents. In every case, the generative AI had made significant mistakes interpreting our research.

This isn’t surprising. Generative AI doesn’t actually reason from rules or principles; it simply puts words together that are statistically likely to go together.

Here’s the types of objective errors we’ve been finding:

  • Invented questions, especially later in the document. In one case, the system generated crosstabs for a less-common demographic question that wasn’t included in this particular questionnaire. In another case, it adopted a third party’s question as if it were part of our study.
  • Mangled crosstabs. Topline findings represented as subgroup findings, and vice versa. Subgroup findings like generational splits that don’t match anything in the data.
  • All-that-apply questions treated like single-choice questions. The findings of these might get re-presented in a pie or donut chart, rather than as a bar or column chart. Sometimes the descriptive text would total items: “50% used Y and Z” when in fact 30% used only Y, 6% used only Z, and 4% used both. (So “40% used Y and Z” is the correct answer.)

As for the subjective errors, these are matters of opinion:

  • Shallow summaries. These often fixated on the most typical findings, especially demographic differences, rather than highlighting the surprises or unusual findings.
  • Silently editing and embellishing verbatim responses, to make them more general and to copy-edit the respondents’ language.

Generative AI probably made mistakes interpreting the other inputs as well, but we don’t have access to those sources. 

Of course humans make mistakes, too! Transposed numbers, all-that-apply choices tallied together, skip patterns left undocumented… But we find that we can QA a human-written document much quicker than an AI-written document.

So take care when working with LLMs. They may be better for brainstorming news angles than for writing first drafts. One thing I’ve noticed is that I now can often detect AI output when I’m reading LinkedIn posts, blog posts, articles on news sites, etc. It has a distinctive style: Latinate words, overuse of cliches, and series after series with three items.

Do you want your marketing copy to sound like AI? Probably not.

But whether you generate your news release and white paper by hand or by AI, we’ll QA it for you, as a standard part of our service.

Author Notes:

Jeffrey Henning

Gravatar Image
Jeffrey Henning, IPC is a professionally certified researcher and has personally conducted over 1,400 survey research projects. Jeffrey is a member of the Insights Association and the American Association of Public Opinion Researchers. In 2012, he was the inaugural winner of the MRA’s Impact award, which “recognizes an industry professional, team or organization that has demonstrated tremendous vision, leadership, and innovation, within the past year, that has led to advances in the marketing research profession.” In 2022, the Insights Association named him an IPC Laureate. Before founding Researchscape in 2012, Jeffrey co-founded Perseus Development Corporation in 1993, which introduced the first web-survey software, and Vovici in 2006, which pioneered the enterprise-feedback management category. A 35-year veteran of the research industry, he began his career as an industry analyst for an Inc. 500 research firm.