November 15, 2016

It is time to talk about variability, the counterpart of central tendency. You might remember from my previous posts that variability and central tendency go hand in hand. Almost all of statistics, no matter how complicated, revolve around these two concepts.

Trigger alert: We cannot talk about variability without doing math.

The math we are going to do here is not at all complicated – addition, subtraction, and square roots. But the notation can be confusing to people because it is not what you learned in high school algebra, so you will need to get used to it.

In this post, we will cover the mean (x‾), the difference from the mean (x-x‾), the squared difference (x-x‾)2, and a couple other things. If you can get comfortable with those three odd-looking things, you are halfway home. Promise. Here we go.

We are going to compute the mean in a formal way to get us used to some of this notation. You will see that we will add one thing to the basic formula for the mean in each portion of this post, but we are sticking to the very simple framework used for the mean. So let's get the mean formula down pat.

We will start with a small group of data. Table A lists all my brothers and sisters and our childhood dog, Rags. Let's pretend they all took a math test and got a score. As we know, the mean is simply the sum of all the scores divided by the number of test takers, or 13.

Here is the equation we just used:

Let's pull this apart. The sigma (Σ) indicates that you are taking the sum of something. So ∑(x) just means the sum of all the numbers in the x column. *n* is just the number of test takers. Perhaps it is easier to think of n as the number of rows in the table.

This is the exact calculation you learned in third grade math. But the teacher did not tell you about the sigma or about the n because a teacher in Oklahoma tried that once and all her third graders got nose bleeds. So we keep it a secret. But you are old enough now to handle the truth. Let's move on.

Take a look at this graph. It displays the test scores for all the individuals (kids and dog) in Table A. The mean is displayed as running right through the bars. The typical test taker is Jack with a score of 13 and then we see that everyone else has a score higher or lower than the mean.

Computing the difference between the actual score and the typical or mean score is the first step in calculating what at some point I decided just to call 'the great big bucket of variability.' (It somehow helped me.)

What we want to do is compute the difference between the score and the mean for every test taker and then add up the differences. Table B does this for us. Doug is two points below the mean, Jack is at the mean, Mary is two points above the mean, etc. And then the differences get summed. The total is zero.

And here is the formula for the average difference from the mean:

First, note that the only change in the formula from equation A is '-x ̅'. Other than that, it is exactly the same as the formula for the mean.

Second, note that the sum of the differences from the mean is zero. That is because . . . it always is. The sum of the differences from the mean is zero 100% of the time, no matter what you are measuring, no matter how many people are in your study. It is not helpful at all because it makes it look as though we have no variability around the mean. And we know that is not true.

Thus, we have to get a smidge more complicated.

Some very wise mathemagician figured out if you squared the differences from the mean, the sum would never equal zero because all the negative signs would go away. You could then have a single number that represented how much difference there is around the mean. Presto.

Table C includes this column, what has now famously come to be called the 'squared difference'. And the sum of those squared differences is an even more famous celebrity in statistics -- 'the sum of squares' (sometimes shown as 'SS').

Bugles sound! The sum of squares is the great big bucket of variability!

The sum of squares is a single number that represents all the noise in the data. If the mean is the signal, the sum of squares is the noise. And you need to sum up all the noise to understand how strong the signal is: a lot of noise means your signal is weak, a little noise means it is strong.

We have been saying from the start that central tendency and variability are the core concepts of statistics. Where the mean is the most important way to compute central tendency, the sum of squares the most important way to compute variability.

The sum of squares is important. We are going to use it a lot in posts to come. But it cannot be interpreted without relation to the mean. So the next step is to take its average.

In the equation here, you can see all we have done is added the 2 superscipt (red). Other than that, it is the same equation as the one above.

When you take the average of the square differences (sum of squares divided by the number of test takers), you end up with variance (represented as σ^2).

Where we might use the word variance to mean variability and difference in a colloquial sense, in statistics, it is a specific number computed for a specific purpose. Variance, in English, is the average squared difference from the mean.

You will never have to read that sentence again. It is a true sentence, though, and knowing it might come in awful handy someday, but you are free to move on.

Statisticians use both the sum of squares and variance a lot. But you probably won't. What you will use a lot is next. And this is very exciting.

So far:

- We learned how statisticians write the formula for the mean, which we all know how to compute, but which we did not know how to write mathematically. All other formulas we have looked at are based on this simple formula we learned in third grade.
- We learned how to add up all the noise around the mean, which is called the sum of squares.
- We learned that we cannot really do anything with the sum of squares until we average it, which is called the variance.

Alas, one problem remains. You recall that to compute the variance, we went through the process of squaring the differences from the mean. The result is that the variance (σ^2=8) is inflated artificially. The mean of our distribution is 13. The variance is 8. Knowing our data (the graph above), common sense tells us the variance is too large to represent the noise around the mean intelligently.

To get to a number that is not inflated is very, very easy: We just take the square root of the variance.

And when we do that, we end up with the standard deviation. Ta da!

The standard deviation is a measure of variability. It is the variance 'standardized' to the mean of the distribution (or, it is the average difference from the mean). This standardization removes the inflation from the variance and gives us a number that is usable and that common sense says, 'Yes. This helps me interpret these data.'

Let's drive this point home. You use the standard deviation by adding and subtracting it from the mean. Below is the graph we started with. I have added both the variance (green) and standard deviation (purple) lines. You can see that the standard deviation gives you a good sense of the variability around the mean. The boundaries it defines capture a certain common sense about how different the test scores are from one another. Just eye-balling it, it seems about right. The variance cannot serve this purpose because it is inflated.

Finally, the standard deviation has a special role when it comes to certain data that are 'normally distributed' (a later post). When a distribution is 'normally distributed,' 68% of cases will fall within a single standard deviation from the mean and 95% of cases will fall within two standard deviations from the mean. Very handy to know.

So there you have it – a basic approach to both central tendency and to variability. The concepts you will use most frequently are probably the mean and the standard deviation. We know they are intertwined. And we know now that we almost always need to compute both. The question we now have is how do we know if one group of data is different from the next? How do we know if our means are stable? For that, we will turn to confidence intervals, the next post. In the meantime, take a breather. Do a little math and clear out your head. Your life is about to change.

Subscribe now to have updates from The Why Axis delivered to your inbox.

## Comments

Let us know what you think - please leave a comment below.