I happened to find this image online and laughed out loud when I read the caption. It certainly represents the thinking of pretty much every person I have ever known who even dabbles in statistics.
Excel is not evil or inaccurate. The challenge is that in Excel you do not get much control over the statistical tests you run and you cannot document well what you have done to prepare the data for analysis (data prep is ninetenths of the law in statistics). It is also clunky – sometimes you have to do your own precalculations before running the statistical test. This is not true in software designed for statistics. You get a great deal of control and the program documents what you do, so you can always go back and replicate your work or find errors – really important things to achieving accuracy.
But the truth is, there are times that you just want to do something simple, and Excel can be a great tool. So I thought it would be useful to show how to run a few basic statistical procedures and when to use them.
I am going to assume you have read the post comparing the different kinds of variables and the table in the post that links those variables to the statistical procedures that are appropriate for them (see The thing of it all – on fields, variables and counting things) and that you know something about basic statistics (please see How Studying Statistics Guarantees a Happy Life and subsequent posts).
Excel uses ranges in its statistical functions, just like for all the rest of the computations it performs. What you need to be aware of is that if there is a missing row of data (a blank field), Excel usually reads that blank as a zero. That can throw off the calculations considerably. So before you run any statistics, make sure your Excel sheet looks like a data file as much as possible.
If you have some values you want to exclude from the analysis (like if you only want to compute the mean BMI on the teenagers in your sample), Excel does not give you a way to do that. You need to sort and copy the subset of data you want to a new sheet.
Excel cannot 'break' your data for you. If you are comparing the means of two groups, you need to sort your data into those two groups and then highlight each groups range separately. This can become very onerous if you have multiple comparisons to run.
Excel cannot run any multivariate statistics, so you will need to move to a different software package for regressions, MANOVAs and all of the other procedures that can be helpful to analysts.
There is always a little confusion when you are beginning your statistical career about which test to use in which situation. The table below gives you some basic guidelines for very common bivariate (two variable) tests. These are the tests we will cover in this post.
What you want: 
Continuous data 
Categorical or dichotomous data 
Example 
To compare different groups of subjects 
Independent groups Ttest 
ChiSquare test 
Comparing the scores of boys and girls who took the same test 
To compare two points in time, the same group of subjects 
Paired Ttest 
McNemar’s test (dichotomous only) 
Comparing the before and after scores of a group of children exposed to a certain treatment 
To compare to a benchmark or to the population 
Confidence intervals 
Confidence intervals 
Comparing Lurie Children’s scores to that of a group of other children’s hospitals 
To test the association or ‘relatedness’ of two different groups 
Pearson correlation 
ChiSquare test 
Associating the age and height of a group of children 
An Excel file has been set up that includes some sample data and the calculations to run each of these. Please take a look at it as you work through these formulas and examples. The formulas are live, so you can copy and paste them into your own workbooks.
Test 
What it is used for 
What you put in the command line 
What Excel gives you 
What it means 
Independent groups ttest 
Compares the means of two separate groups (in our example, boys and girls). 
In English: =TTEST(range for boys, range for girls, 2tailed independent, assume equal variances)
In Excelspeak: =TTEST(C2:C12,C13:C21, 2, 2)

A probability (in the example sheet, 0.80) 
The probability that the two groups are the same (if less than .05, they are different in the statistical sense). 
Paired ttest 
Compares two time points for a single group.

In English: = TTEST(range for height on first day, range for height on last day, 2tailed, paired)
In Excelspeak: = TTEST(E2:E21,F2:F21, 2,1) 
A probability (in the example sheet, 0.00) 
The probability that the measures taken at two time points are same (if less than .05, they are different in the statistical sense). 
Pearson Correlation 
Measures the association between two different variables. 
In English: =PEARSON(range of ages, range of heights)
In Excelspeak: =PEARSON(C2:C21,E2:E21) 
A Pearson correlation coefficient (in the example sheet, 0.95) 
Pearson correlation coefficient (also noted as 'r') is interpreted in the following way: The closer the coefficient is to 1.0, the more correlated the two variables are; the closer to 0, the less correlated they are. If the correlation has a negative sign, the correlation is inverse (or negative), meaning as one variable’s value goes up, the other one’s value goes down.

Confidence interval 
Compares a mean or a percentage of one group to another (or to a benchmark). 
In English: =CONFIDENCE.NORM(significance level, standard deviation of your sample, sample size)
In Excelspeak: =CONFIDENCE.NORM(J26,J27,J28) 
The confidence interval (in the example sheet, 5.08) 
Excel gives you the confidence interval that you must both add to and subtract from your group mean to create a range (in the examples, 50.69 to 61.31). If the benchmark is within this range, the difference between the sample mean and the benchmark is not statistically significant.

Chi Square 
Measures the association between two variables. It can be used on much larger tables than the one shown in this example. But the challenges of doing that in Excel are pretty high.

In English: =CHISQ.TEST(range of actual values, range of expected values)
In Excelspeak: =CHISQ.TEST(C3:D4,C8:D9) 
A probability (in the example sheet, 0.001) 
The probability that the two groups are the same (if under .05, they are different in the statistical sense).
Note: Alas, Excel does not compute the expected values in the cells. You either need to compute them yourself (as I did) or pull them from another source. If you are comparing your data to the population, you would just use the frequencies for the population. 
McNemar 
Computes the probability that a change has occurred by something other than chance. It is often used for clinical studies when you are comparing outcomes for a treatment and control group. 
In English: =BINOM.DIST(the lower of the two change values, the change values, expected probability of change if null hypothesis is true, use the cumulative distribution function)
In Excelspeak: =BINOM.DIST(MIN(K16,K17),K16+K17,.5,TRUE) 
A probability (in the example sheet, 0.004) 
The probability that the two groups are the same (if under .05, they are different in the statistical sense). 
Once you start tootling around in Excel's statistical world, you will quickly see why it is not the preferred software for people who work regularly with statistics. Even so, in a pinch, it can be helpful if you know how to use it and are aware of its limitations.
Subscribe now to have updates from The Why Axis delivered to your inbox.
Please leave a comment
Comments will be moderated.