As teams assemble questionnaires for us to review, these draft survey instruments often end up with a mishmash of scales, with different questions having scales with three, four, five, seven, eleven, or more items, as everyone incorporates their favorite scale. One of the first revisions we make is to replace all the different-sized unipolar scales with five-point scales instead.

In 2008, Jon Krosnick, professor of communication at Stanford, and then-doctoral student Alexander Tahk wrote “The Optimal Length of Rating Scales to Maximize Reliability and Validity”:

Survey research frequently uses multi-point scales to assess respondents’ views. These scales vary from two points (e.g., agree or disagree) to 101 points (e.g., the American National Election Study’s thermometer-style ratings). Scales can also vary in another regard: being bipolar (meaning the zero point is in the middle and the end points are opposites, such as extremely positive and extremely negative) or unipolar (meaning the zero point is at one end, as in “not at all important”). However, different scale lengths may differ in reliability, so it is important to understand how the length of the scales affects the reliability of the responses.

To explore the relation between scale length and reliability, we conducted a meta-analysis of the results of many past studies. Our data consist of results from 706 tests of reliability taken from thirty different between-subject studies. We combined various measures of reliability and various sample sizes, controlling for these and other factors in determining the relation of scale length to reliability.

In general, we found that five- or seven-point scales produced the most reliable results. Bipolar scales performed best with seven points, whereas unipolar scales performed best with five. We also found that offering a midpoint on a bipolar scale, indicating a neutral position, increased reliability.

The situation has evolved for bipolar scales since this was published, with the best practice now being to break such questions into batteries of three or four questions. For more on bipolar scales, see “When and How to Use Bipolar Scales“.

The other best practice for five-point scales is to label every point (e.g., Not at all satisfied, Hardly satisfied, Somewhat satisfied, Very satisfied, Completely satisfied) and to hide any numbers that might be used for analysis behind the scenes. See “The Case for Fully Labeled Scales” for the research behind that.

Source: UserZoom

Originally published on December 5, 2017. Updated to use current-recommended scale wording.


Author Notes:

Jeffrey Henning

Gravatar Image
Jeffrey Henning, IPC is a professionally certified researcher and has personally conducted over 1,400 survey research projects. Jeffrey is a member of the Insights Association and the American Association of Public Opinion Researchers. In 2012, he was the inaugural winner of the MRA’s Impact award, which “recognizes an industry professional, team or organization that has demonstrated tremendous vision, leadership, and innovation, within the past year, that has led to advances in the marketing research profession.” In 2022, the Insights Association named him an IPC Laureate. Before founding Researchscape in 2012, Jeffrey co-founded Perseus Development Corporation in 1993, which introduced the first web-survey software, and Vovici in 2006, which pioneered the enterprise-feedback management category. A 35-year veteran of the research industry, he began his career as an industry analyst for an Inc. 500 research firm.