Recently a client asked us to do a factor analysis prior to a cluster analysis.
A what before a what?!
Cluster Analysis
Let’s start with cluster analysis. Ray Poynter has two great introductions that he has written over the years–
- Five tips about segmentation in marketing – Key takeaway: “Segmentation is useful when the number of products or services that can be offered is limited…[or] when people need help to organise a complex world.”
- Cluster Analysis, a qualitative technique in quant clothing – Key takeaway: “Cluster Analysis is different from many other marketing science techniques in two important ways: 1) What it is seeking to measure is almost never there; 2) It is not really a scientific, quantitative technique.”
Think of cluster analysis as a statistically-driven editorial process designed to simplify a complex set of respondents. A key editorial decision is how many clusters to specify: the software typically takes the number of desired groups as an input, with researchers commonly evaluating schemes with 3 to 8 clusters to determine which is most useful.
While you can decide to conduct a cluster analysis after the data is in, better to design the questionnaire with cluster analysis in mind. As Ray writes:
- The thinking starts when the survey is being designed. With cluster analysis you would like your variables to have been collected on a similar scale (adjusting later is not as good), and you do not want missing data.
For a more technical discussion, see Joel Cadwell’s post “Warning: Clusters May Appear More Separated in Textbooks than in Practice.”
Factor Analysis
Like cluster analysis, factor analysis is designed to simplify complex data sets. Factor analysis is typically used to consolidate long lists of items. If you have 90 employee engagement questions, factor analysis can reduce this to a more manageable set. It works by grouping items that highly correlate to one another. See Maike Rahn’s article series, “Factor Analysis: A Short Introduction”, for much more.
Greg Timpany of Global Knowledge says, “Factor analysis and cluster analysis are a classic combination, like a chili hotdog! It depends on the statements you are putting in — with a hundred attitudinal questions, stuff will just wash out, so run factor analysis first then use the data reduction to find the meaningful differences. Since the propensity for multicollinearity is more than enough to whitewash a lot of meaningful differences, I almost always use the two together.”
Ray Poynter somewhat disagrees:
- Do not factor analyse the data and then cluster factor scores (this used to be a recommended process). For the cluster analysis to work it needs what marketing scientist Jon Pinnell describes as ‘lumpiness’ in the data. Factor scores smooth that lumpiness out. By all means run a factor analysis to choose a subset of attributes to use, but don’t use factor scores
Sui Ho, who has been producing factor analyses for 30 years, agrees with Ray, saying, “Factor analysis increases the error rate.”
Julie Worwa takes it further, pointing out, “Factor analysis and cluster analysis are kind of independent of each other. Once you have the clusters of your segmentation (on unfactored data), you may do a factor analysis to see how those clusters’ ratings on the factors shake out, in order to flesh out the persona of each segment. Sometimes it’s helpful; sometimes it’s not.”
Next Steps
Want to learn more? Research Rockstar offers a great online class on these topics: Introduction to Factor & Cluster Analysis.
Want someone to do it for you? Check out our new Cluster Analysis service.