What to ask the analyst
With the rise of complex data and analytics, we all have to work harder to make sure we understand analyses presented to us. We need to be "discriminating customers." And to do that, we need to be ready to ask some basic questions whenever an analysis is presented.
Analysts who work with me can count on having to answer these five questions. After a while, they see them coming and build the answers into their presentation. These questions are a good place to start when interpreting an analysis presented to you.
- What is the source of the data?
This seems obvious perhaps. But every health care organization has several 'data aggregators' that normalize and integrate data in different ways, so it's not OK to just say 'the data are from the EMR.' If you want to know how many of a certain surgical procedure we performed, you will get one answer from the operational reporting database, a second from research data, and a third from billing data.
Each database is designed to meet a different need and therefore has what analysts call 'systemic error' – the data are not wrong, they just cannot be interpreted analytically in a straightforward way. We never expect perfect analytic data from the EMR and our aggregators, but we do want to know what types of systemic error might show up in any particular analysis.
- How well do the cases in the analysis represent the population?
Sometimes, the patient groups we look at are so small that any analysis we do from our data system would represent the full population of cases in the region. This is the only circumstance where sampling is not a problem, so you should assume sampling bias might enter almost every analysis you are looking at.
These are the questions that go through my mind when I look over an analysis. The specific questions I ask the analyst depend on the purpose of the analysis and what kind of decisions we are hoping to inform.
- Is the sample you ended up with an artifact of the source of data? Who is missing and how different are they?
- Is the sample you ended up with an artifact of a particular workflow or process that perhaps cannot be generalized to the set of patients you want to generalize to (for example, have you unwittingly only collected data from patients with evening appointments)?
- Are there seasonal factors in play? We see a lot more flu in the winter than the summer, and a lot more injury in the summer, for example.
- If you are using the data to plan a program expansion, are the patients you are studying the same as those you might expand to?
Not every difference between a sample and the population matters, so thinking through what differences matter and which ones don't is very important.
- Does the data distribution include outliers or other unexpected features?
I begin every analysis by doing a simple frequency distribution of the fields I care about. A frequency distribution tells you right away if there are outliers, if the distribution is skewed, multimodal, or if there are errant values (like a 43 year old patient in the neonatal intensive care unit). Here is an example of a problem posed by a bimodal distribution --
There are lots of ways to approach a difficult distribution. Your analyst should be able to explain how they thought this through and how it affects the numbers you are looking at.
- What assumptions are behind the analysis?
This brings us to some pretty complex places, statistically and substantively. If you have any statistical background, you may wonder whether the analytic tools used actually fit the data structure. It can be as simple as using a pie chart when a bar chart would be the way to go. It can get really complicated fast.
Substantively, you may have an endless array of questions about the analyst's assumptions. Here are a few to warm you up –
- Did the analyst assume certain annual growth rate (and is that assumption reasonable)?
- Did the analyst assume that everything else would remain the same (and is that assumption reasonable)?
- If you are looking at costs to the hospital, you may be interested in what costs the analyst included and which ones were left out. There are standards about measuring return on investment in healthcare. These standards should be used whenever possible.
- If some patients are included and not others, you will need to know why your analyst left out whomever they did.
- How likely is it that you are looking at cause and effect?
Cause and effect is very, very, VERY hard to prove. Yet, our minds are programmed to jump to it pretty quickly. We are nothing if not pattern-seeking creatures. So you need to use some elbow grease to fight against jumping to conclusions.
Here are a couple quick things you can request that help rule out alternative explanations:
Ask for statistical testing. This will help you figure out if the changes in the data reflect natural variation or not. Lots of times we have ups and downs that really do not make a pattern.
Ask what competing theories have been ruled out. For example, perhaps a big improvement in a certain kind of patient safety effort is related to a shift in the types of patients we are seeing and not new safety efforts. Before presenting the analysis, a good analyst will throw a bunch of alternative theories on the wall and rule out as many as they can.
Ask for at least one full year of data to correct for any seasonal patterns. As mentioned above, there are pretty clear seasonal trends in health care. If the analysis is based on summer admissions alone, it would not likely be representative of all of admissions.
It is my experience that analysts never mean to present something that is incomplete or not fully clear. But the analytic process is usually a very long and winding path, in which analysts need to make assumptions about the data at many different points to move forward. The full breadth of these assumptions and the complexities of making them may not be obvious when you review the final product. Being inquisitive before accepting results can make sure that the decisions made from the analysis are sound, and can help the analyst rethink assumptions in a very productive way.