November 28, 2016

Now that you know the basics of stats – how statisticians think about central tendency and variability – it is time to change your life. This post will teach you how to compute confidence intervals around a mean and around a percentage. These are both back-of-the envelope computations. Remember, I promised peace of mind and relief from all of the data being thrown at you. Here is how you get there.

A confidence interval is a special range around a mean. If you were to compute this mean again on another sample of patients, from the same population, etc., the mean on the new group would fall into this range 95% of the time.

A large confidence interval tells you there is a lot of variability in the distribution and that the mean is not a very precise indication of what might happen in future samples. A very small confidence interval tells you that the mean is a pretty precise estimate of the distributions of future samples.

Confidence intervals can also be computed for percentages and rates.

* Example with means:* Let's say you developed a course to teach nurses a certain new skill. The nurses took a test at the end of the course. The mean score from the test on the first group of nurses was 10. The confidence interval was 8-12. If you planned to repeat this course with similar groups of nurses, 95% of the groups would have a test score mean in the range of 8-12. You would need to ask yourself if that is a satisfactory amount of learning for this new skill.

* Example with percentages:* When you see public opinion polls, responsible pollsters will always offer both the percentage of respondents who hold a certain opinion AND a confidence interval around that percentage. A pollster will say, '54% of Americans think Beany Babies are cute, plus or minus 4%.' This would indicate that the pollster has calculated that if he or she would refield the same questionnaire to a similar group of Americans, it is 95% likely that the percent of Americans who answer that Beany Babies are cute will be between 50% and 58%.

First, confidence intervals tell you how stable the mean, percentage or rate that you are looking at is. This may be all you need. Lots of times we just want to know how likely it is that a mean will look the same with future groups. The confidence interval tells us enough to plan ahead.

Confidence intervals also help you compare two samples to each other. Let's go back to the nurse training example above (where the mean test score was 10 and the confidence interval was 8-12). Perhaps you want the average test score to be higher than 10, so your reconfigure the class and try it out on a second group of nurses who are similar to the first group in all the important ways. The second group of nurses had a mean score of 15 with a confidence interval of 13-17. Since the confidence intervals of the two groups do not overlap, you could conclude that the second way of teaching the course did, indeed, improve learning more than the first course did.

These are simple but very powerful uses of confidence intervals. Being able to compute confidence intervals on the back of an envelope could liberate you from a lot of needless pondering.

The math that supports statistics is pretty 'robust,' as statisticians like to say, meaning the math works well with a great number of situations when the data are not perfect. But there are two assumptions we need to make here. If they are not both met, you cannot use the formulas below.

First assumption: That the mean or percentage is based on at least 50 cases.

Second assumption: That the distribution around the mean is 'normal' (meaning it looks like a bell curve, below). There should be no long tails; the distribution should not be strongly skewed. In health care, this is a hard assumption to meet. A future post will deal with issues around distributions we often see in health care and how to deal with them.

We are just going to go through the steps quickly. You will need the standard deviation of your sample, the mean and the number of cases.

Step 1: Compute the standard error of the mean. Divide the standard deviation by the square root of the number of cases. It looks like this, where seM is the standard error of the mean:

*se _{M} =σ/√n*

Step 2: Multiply the standard error by two (or 1.96 to be precise, but 2 is fine).

Step 3: Compute the confidence interval -- add the product of the standard error multiplied by two to determine the top of the range; subtract it to determine the bottom of the range.

Upper end of confidence interval = *mean* + 2(*se _{M} *)

Lower end of the confidence interval = *mean* - 2(*se _{M} *)

Step 4: Display properly.

*x̄* (lower end, higher end)

For this, you will need the percentage and the number of cases.

Step 1: Transpose the percentage into a proportion (the proportion is represented as *p*).

Step 2: Calculate the complement of the proportion (*q*).

*q *= 1.0* – p*

Step 3: Compute the standard error of the percentage (sep).

*se _{p}* =

Step 4: Multiply the standard error by two (or 1.96 to be precise, but 2 is fine).

Step 5: Compute the confidence interval -- add the product of the standard error multiplied by two to determine the top of the range; subtract it to determine the bottom of the range.

Upper end of confidence interval = *p* + 2(*se _{p}*)

Lower end of the confidence interval = *p* - 2(*se _{p}*)

Step 6: Transpose back into percentages and display properly.

*x*% (lower end, higher end)

*Example:*

Let's walk through computing a confidence interval for a percentage, since that is what you will likely do more of. You have 73.2% of patients in your clinic getting the flu vaccine. You want to know if everything stays the same next year, what is the range you can expect to have vaccinated. You have 125 patients.

Step 1: Transpose the percentage into a proportion (the proportion is represented as *p*).

*p* = .732

Step 2: Calculate the complement of the proportion (*q*).

*q* = 1.0 – *p*

*q* = 1.0 - .732

*q* = .268

Step 3: Compute the standard error of the percentage (*se _{p}*).

*se _{p}* = √((

*se _{p}* = √((.732 * .268)/125)sep = √((.196)/125)

*se _{p}* = √(.00157)

*se _{p}* = .0396

Step 4: Multiply the standard error by two.

2(*se _{p}*) = .079

Step 5: Compute the confidence interval -- add the product of the standard error multiplied by two to determine the top of the range; subtract it to determine the bottom of the range.

Upper end of confidence interval = *p* + 2(*se _{p}*)

Lower end of the confidence interval = *p* - 2(*se _{p}*)

Upper end of confidence interval = .732 + .079 = .811

Lower end of the confidence interval = .732 - .079 = .653

Step 6: Transpose into percentages and display properly.

73.2% (65.3%, 81.1%)

If all of the important things stay the same (availability of the flu vaccine, staffing, etc.), it is *95%* *likely* that the percentage of patients at your clinic who will receive the flu vaccine next year will fall between 65.3% and 81.1%.

You will notice some things pretty quickly once you start using these formulas. First, the more cases you have (the higher your *n*), the slimmer your confidence intervals. This is one key reason why statisticians are obsessed with sample size.

Second, you don't use the sum of squares when computing the confidence intervals on proportions. All things being equal, percentages have higher confidence intervals as they approach 50% than they do as they approach 99% or 1%. That is because it is easier to estimate events that occur almost all of the time or almost none of the time than events that occur some of the time. Interesting, no?

A statistician has many more tools in their workbench and can make what may seem to be countless adjustments to these formulas. But these are the main ones and will do very well for a quick check on the quality of your data, and for some basic help in deciding what your data actually mean.

Subscribe now to have updates from The Why Axis delivered to your inbox.

## Comments

Let us know what you think - please leave a comment below.