An Introduction to Data Analysis & Presentation

Prof. Timothy Shortell, Sociology, Brooklyn College

Measures of Variability

When we generated a measure of central tendency, it was to summarize the most typical case in a distribution. In addition to that information, we would like to know how much the other cases in the distribution vary from one another. In other words, we want to know the typical deviation.

The measures of variability tell us how much scores differ from one another. Like the three measures of central tendency, the measures of variability each tell us something different about the distribution.

First, we must note that none of the measures of variability make sense with nominal data. This is because the values are simply labels, and no mathematical operations can be meaningfully performed on them.

The simplest measure of dispersion is the range. It is calculated as the highest (largest) score minus lowest (smallest) score. By marking the distance between the end points of a distribution, it conveys a rough sense of the variation among elements. Because it is a mathematically simple measure, it is of only limited value. It can be calculated on ordinal data; however, the range is not very useful with Likert variables, because they have fixed end points. Thus, the range is likely to reflect the characteristics of the answer scale more than the dispersion of the data.

What is the range
in these examples?

We want a measure of variability that is as mathematically sophisticated as the mean, one that takes into account as much information in the data as possible -- that is, a measure that considers both rank and magnitude. This measure would be appropriate only with interval data, just like the mean.

We can start with the notion of a deviation score. We can define it as the distance, or difference, between a score and the mean. We can calculate it for every case in the distribution. A positive sign would indicate that the score is larger than the mean, and a negative sign would indicate that the score is below the mean. The more dispersion there is in a set of scores, it would seem, the larger the amount of total deviation score.

We might then take the average of these deviation scores. This way, we have a measure analogous to the mean. Let's calculate the average of the deviation scores for this example. First, calculate the mean. Next, calculate a deviation score for each case. Next, sum them up. Hmm, now we have a problem. Because the mean is calculated as the arithmetic average, be definition, deviance scores sum to zero. The amount of deviation above the mean is always exactly balanced by the amount of deviation below the mean.

We need to devise some way to avoid this problem. As it turns out, some clever mathematician-types discovered that if you first square the deviation scores and then sum them, you avoid the problem of the positive deviations cancelling out the negative ones.

We can then take the average of these squared deviation scores. This is called the variance.

The variance is hard to interpret, though, because when we square the values we also square the units. To get back to our original units, we can take the square root of the variance. This is the standard deviation. Our definitional formula for the standard deviation, then, is indicated at the left.

 





Let's look at some examples:

All materials on this site are copyright © 2001, by Professor Timothy Shortell, except those retained by their original owner. No infringement is intended or implied. All rights reserved. Please let me know if you link to this site or use these materials.