At the 2013 Inbound Marketing conference, Nate Silver said, “The average is still the most useful mathematical tool ever invented.”
I agree that it is powerful, but I would argue that in fact it is the median that is the most useful mathematical tool ever invented. This lesson was brought home to me as I analyzed a few recent surveys.
- The average is the sum of all values divided by the number of values.
- The median is the middle item in a ranked list of all values (in case of an odd number of items) or the average of the two items adjacent to the middle (in case of an even number of items).
A simple example, from a question on the number of children per household (with an invented data set): the number of children reported are 0, 0, 0, 0, 0, 0, 1, 2, 6. The average is 1 child per household, e.g., (1+2+6)/9, while the median is 0 children per household.
The magic of the median, compared to the average, is that it automatically corrects for outliers.
For instance, in a recent survey with 105 responses, household income ranged from $5,000 to $205,000. The average was $45,857. The median, however, was just $35,000.
Or, from a study on app purchasing behavior, the median amount spent on apps since the mobile device was purchased was $15.00, but the average was twice that: $30.53.
Why was the average so high? 1% of respondents (8 individuals) spent $300 or more dollars, and 2 individuals spent $600 and $970 respectively. Now if you want to make the average work you have to ask yourself if these outliers are unreasonable – using the median spares you from having to make such judgment calls.
One more example: for one recent survey, respondents took an average of 16 minutes to complete the survey. But the median was just 12 minutes. Why the disparity? The longest time was 101 minutes, which was probably a respondent who got interrupted and then resumed taking the survey.
The average and the median are summary statistics, meant to describe a range of values. The median is more representative of all the values then the average when data is skewed asymmetrically.
And, in those cases when the data is distributed symmetrically, the median is still accurate.
The one place the average may be superior: in census data, when you are including almost all of the population. (A pretty rare occurrence.)
Sam Wang, whose site Princeton Election Consortium consistently gives Nate Silver’s FiveThirtyEight a run for its money on Presidential elections, offers a spirited defense of the median.
All of this is why I argue that “The median is still the most useful mathematical tool ever invented.”