Agreement scales, measuring how much the public agrees or disagrees with a particular idea, are one of the most popular types of questions. Unfortunately, they are also one of the least reliable types of questions. Since respondents tend to exaggerate their actual agreement, other scales should be used instead.
In fact, over 100 separate studies have demonstrated acquiescence bias, the tendency for respondents to agree, regardless of the content of the statement. In their landmark paper, “Comparing Questions with Agree/Disagree Response Options to Questions with Construct-Specific Response Options“, Saris, Krosnick and Shaeffer share three reasons why respondents are so darned agreeable:
- Out of a sense of politeness.
- Out of respect for the survey author; in effect, deferring to “expert” judgment.
- From mental exhaustion with the survey (“satisficing”).
In fact, agreement scales are mentally taxing in many ways.
- Respondents must execute at least four cognitive steps: 1) reading and understanding the question; 2) determining which aspect of the question is of interest to the researcher; 3) identifying how they feel about the issue; 4) translating their feeling to the most appropriate item on the bipolar scale.
- Bipolar scales (scales with opposites) are more mentally taxing than unipolar scales (scales that range from none to much of a quantity or property).
- Negated items (statements with “not”) require an extra step to process. In fact, back in 1918 Wembridge and Means found that subjects took an average of 3.4 seconds to answer a question with a negative compared to 1.7 seconds to answer an affirmatively phrased question.
- It is often unclear which aspect of the agreement statement is the key assertion.
Saris et al give a great example of the latter point:
However, “disagree” … responses can be offered for various different reasons, some of which violate the presumed monotonic relation between answers and respondent placement on the underlying dimension of interest.
For example, consider a person who is asked whether he or she agrees or disagrees with the statement: “I am generally a happy person.” A person who disagrees may believe (1) he or she is generally an unhappy person, (2) he or she is generally neither happy nor unhappy, and instead is usually affectless, (3) he or she is happy 55% of the time and unhappy 45% of the time, and 55% of the time is not frequent enough to merit the adjective “generally”, or (4) he or she is always happy, and “generally” does not represent this universality adequately.
The use of negated items was presumed to increase the quality of responses to agreement scales. However, as far back as 1961, Eifermann discovered that respondents made errors when processing “not”: for instance, 35% got statements such as “6 is not an even number” wrong, while only 2% got statements such as “6 is an odd number” wrong. As a result, creating balanced batteries of about as many negative statements as affirmative statements does not improve quality. In fact, also in 1961, Messick and Jackson demonstrated that reversed items suffered from acquiescence bias and positively correlated to their counterparts rather than negatively correlating to them. Again, people preferred to agree, no matter the content.
Here’s what is so disagreeable about the use of agreement scales. They’ve been known to be bad practice for over 50 years, yet researchers persist in using them.
Why? Because they are so easy to use. For a new employee-engagement questionnaire template I was designing, I researched existing surveys, where you will often find 50, 60, even 80 statements in a row, all using the agreement scale. You can measure anything on it, without needing to spend the effort to determine the best way to ask a question.
The better approach, which will be less mentally taxing for respondents and produce more accurate results, is to use what Saris et al call “construct-specific responses”: basically, measure the dimension of interest using an appropriate scale, which will most likely vary from question to question. For instance, here are three statements from an employee-satisfaction survey:
- I am satisfied with my overall compensation. Strongly agree, somewhat agree, slightly agree, neutral, slightly disagree, somewhat disagree, strongly disagree.
- I am often so involved in my work that the day goes by very quickly. Strongly agree, somewhat agree, slightly agree, neutral, slightly disagree, somewhat disagree, strongly disagree.
- Communication between senior leaders and employees is good in my organization. Strongly agree, somewhat agree, slightly agree, neutral, slightly disagree, somewhat disagree, strongly disagree.
Here are the same three questions rewritten to use appropriate scales:
- How satisfied are you with your overall compensation? Not at all satisfied, hardly satisfied, somewhat satisfied, very satisfied, completely satisfied.
- How frequently are you so involved in your work that the day goes by very quickly? Never, rarely, sometimes, often, always.
- How is communication between senior leaders and employees in your organization? Terrible, poor, acceptable, good, excellent.
And here’s a list of common rating scales to use instead of the agreement scale.
Yes, this is more work on the part of the survey author. But shouldn’t the researcher rather than the respondent be the one doing the work?
And if you were the respondent would you prefer to answer a battery of 80 agreement statements or a series of questions each using the most appropriate scale? The variety may keep respondents from satisficing.
I understand if you, like many others, still want to use agreement scales in your research. But we will have to agree to disagree.
Originally published 2013-09-13. Updated to use a seven-point bipolar scale in the example.