⚠ COVID-19 INFORMATION: Vaccine Information, Other Resources 

Which test when: Basic stats you can use in Excel

I happened to find this image online and laughed out loud when I read the caption. It certainly represents the thinking of pretty much every person I have ever known who even dabbles in statistics.

Excel is not evil or inaccurate. The challenge is that in Excel you do not get much control over the statistical tests you run and you cannot document well what you have done to prepare the data for analysis (data prep is nine-tenths of the law in statistics). It is also clunky – sometimes you have to do your own pre-calculations before running the statistical test. This is not true in software designed for statistics. You get a great deal of control and the program documents what you do, so you can always go back and replicate your work or find errors – really important things to achieving accuracy.

But the truth is, there are times that you just want to do something simple, and Excel can be a great tool. So I thought it would be useful to show how to run a few basic statistical procedures and when to use them.

I am going to assume you have read the post comparing the different kinds of variables and the table in the post that links those variables to the statistical procedures that are appropriate for them (see The thing of it all – on fields, variables and counting things) and that you know something about basic statistics (please see How Studying Statistics Guarantees a Happy Life and subsequent posts).

Basic things Excel does

Excel uses ranges in its statistical functions, just like for all the rest of the computations it performs. What you need to be aware of is that if there is a missing row of data (a blank field), Excel usually reads that blank as a zero. That can throw off the calculations considerably. So before you run any statistics, make sure your Excel sheet looks like a data file as much as possible.

If you have some values you want to exclude from the analysis (like if you only want to compute the mean BMI on the teenagers in your sample), Excel does not give you a way to do that. You need to sort and copy the subset of data you want to a new sheet.

Excel cannot 'break' your data for you. If you are comparing the means of two groups, you need to sort your data into those two groups and then highlight each groups range separately. This can become very onerous if you have multiple comparisons to run.
Excel cannot run any multivariate statistics, so you will need to move to a different software package for regressions, MANOVAs and all of the other procedures that can be helpful to analysts.

What tests to run when

There is always a little confusion when you are beginning your statistical career about which test to use in which situation. The table below gives you some basic guidelines for very common bivariate (two variable) tests. These are the tests we will cover in this post.

 

What you want:

Continuous data

Categorical or dichotomous data

Example

To compare different groups of subjects

Independent groups T-test

ChiSquare test

Comparing the scores of boys and girls who took the same test

To compare two points in time, the same group of subjects

Paired T-test

McNemar’s test (dichotomous only)

Comparing the before and after scores of a group of children exposed to a certain treatment

To compare to a benchmark or to the population

Confidence intervals

Confidence intervals

Comparing Lurie Children’s scores to that of a group of other children’s hospitals

To test the association or ‘relatedness’ of two different groups

Pearson correlation

ChiSquare test

Associating the age and height of a group of children

How to run the test

An Excel file has been set up that includes some sample data and the calculations to run each of these. Please take a look at it as you work through these formulas and examples. The formulas are live, so you can copy and paste them into your own workbooks.

Test

What it is used for

What you put in the command line

What Excel gives you

What it means

Independ-ent groups t-test

Compares the means of two separate groups (in our example, boys and girls).

In English:

=TTEST(range for boys, range for girls, 2-tailed independent, assume equal variances)

 

In Excel-speak:

=TTEST(C2:C12,C13:C21, 2, 2)

 

A probability (in the example sheet, 0.80)

The probability that the two groups are the same (if less than .05, they are different in the statistical sense).

Paired t-test

Compares two time points for a single group.

 

In English:

= TTEST(range for height on first day, range for height on last day, 2-tailed, paired)

 

In Excel-speak:

= TTEST(E2:E21,F2:F21, 2,1)

A probability (in the example sheet, 0.00)

The probability that the measures taken at two time points are same (if less than .05, they are different in the statistical sense).

Pearson Correlation

Measures the association between two different variables. 

In English:

=PEARSON(range of ages, range of heights)

 

In Excel-speak:

=PEARSON(C2:C21,E2:E21)

A Pearson correlation coefficient (in the example sheet, 0.95)

Pearson correlation coefficient (also noted as 'r') is interpreted in the following way: The closer the coefficient is to 1.0, the more correlated the two variables are; the closer to 0, the less correlated they are.  If the correlation has a negative sign, the correlation is inverse (or negative), meaning as one variable’s value goes up, the other one’s value goes down. 

 

Confidence interval

Compares a mean or a percentage of one group to another (or to a benchmark).

In English:

=CONFIDENCE.NORM(significance level, standard deviation of your sample, sample size)

 

In Excel-speak:

=CONFIDENCE.NORM(J26,J27,J28)

The confidence interval (in the example sheet, 5.08)

Excel gives you the confidence interval that you must both add to and subtract from your group mean to create a range (in the examples, 50.69 to 61.31). If the benchmark is within this range, the difference between the sample mean and the benchmark is not statistically significant.

 

Chi Square

Measures the association between two variables.  It can be used on much larger tables than the one shown in this example.  But the challenges of doing that in Excel are pretty high.

 

In English:

=CHISQ.TEST(range of actual values, range of expected values)

 

In Excel-speak:

=CHISQ.TEST(C3:D4,C8:D9)

A probability (in the example sheet, 0.001)

The probability that the two groups are the same (if under .05, they are different in the statistical sense).

 

Note: Alas, Excel does not compute the expected values in the cells.  You either need to compute them yourself (as I did) or pull them from another source.  If you are comparing your data to the population, you would just use the frequencies for the population.

McNemar

Computes the probability that a change has occurred by something other than chance.  It is often used for clinical studies when you are comparing outcomes for a treatment and control group.

In English:

=BINOM.DIST(the lower of the two change values, the change values, expected probability of change if null hypothesis is true, use the cumulative distribution function)

 

In Excel-speak:

=BINOM.DIST(MIN(K16,K17),K16+K17,.5,TRUE)

A probability (in the example sheet, 0.004)

The probability that the two groups are the same (if under .05, they are different in the statistical sense).

Summing up

Once you start tootling around in Excel's statistical world, you will quickly see why it is not the preferred software for people who work regularly with statistics. Even so, in a pinch, it can be helpful if you know how to use it and are aware of its limitations.

Subscribe now to have updates from The Why Axis delivered to your inbox.