Breaking the analytic process down into 7 'easy' steps
I was recently asked to describe the analytic process, and found myself at a loss. So this post is an attempt to lay that process out somewhat concretely. The challenge is that it is basically a learning process. We ask a question, find an answer, that answer leads to more questions. This iterative process is built into every step of analysis. Even when the final analytic product is delivered, it usually has many questions still surrounding it and creates more questions. This is the way people learn and the way their minds work.
The analytic process is rarely a solo road trip. Analysts rely on requesters to help with some of the steps. Analysts are never order-takers, regardless of how much everyone wishes that were the case. That is because the process of analysis, at its heart, requires active learning by everyone involved.
More than any other step, defining the analytic question belongs to the requester. No other step in the process is as important or as definitive. A question that is not the right one can lead to many hours of wasted analytic effort. Even a small change in question phrasing can have significant impact on the analysis that is actually done.
When many people make a request from DAR, they can get frustrated with the number of questions we ask about the request. These questions are intended to refine the request to something we are sure we understand and can deliver, and make sure it is answerable. DAR staff members have become quite good at answering questions using SMART measures. So we will always try to turn your questions into a SMART question.
Once the question is clearly defined, the analyst will propose a path to get an answer. The requester will have a lot of input on the path, because they are what we call the 'content' expert – the one who best understands the meaning behind the question and how findings will be applied. But it is really up to the analyst to sketch out the path and suggest different routes.
This path is called an analysis plan. The analysis plan has many pieces. They are not all necessary all the time, but each analysis plan usually includes these things:
- Literature search (on content area, or methods applied to questions of this type)
- Operationalization of dependent variable and any independent variables (as well as any covariates)
- Identification of potential data sources (at a high level)
- Development of inclusion/exclusion rules for subjects
- Development of a comparison approach (groups v. trend)
- Assessment of the feasibility of analytic plan as designed
It may take a couple of meetings to settle on an analysis plan. And once the work has begun, the plan may need to be adjusted. But to begin without a plan is to simply invite analytic chaos into your life.
Here is where the analyst really takes over. There are two fundamentally different routes to data collection and preparation. The first, called primary data collection, involves collecting data exclusively for the question at hand. Many, many research questions involve primary data collection. But some operational questions do, too. For example, if we want to know something new about what our parents think of our services, we need to devise ways to collect data that can answer the new question which do not exist.
The second route is secondary data collection. Secondary data collection entails using data collected for a DIFFERENT purpose to answer a question. Much of the process improvement, quality improvement and clinical research work done in this organization uses our EHR data to answer questions it was not designed to answer.
The two routes are quite parallel and often quite time-consuming. In addition to just getting data on a spreadsheet to analyze, the analyst needs to clean it and prep it for analysis. This requires testing both its validity and reliability to answer the question at hand. A good analyst will deliver a 'squeaky clean' data set for analysis – one where all the errors in the data have been identified (even if they cannot be fixed) and one where all of the assumptions about the data set are clear.
Once the data set has been developed, the analyst needs to step back and make sure what they created will indeed answer the question. He or she needs to consider response rates, lack of representativeness, sources of error and other critical problems with the data. Depending on the health of the data set, the analyst might need to check in with the requester to make sure that the data can indeed answer the question.
More than once in my career, I have taken a good, hard, critical look at data that I fought for even years to pull together and said, 'Nope. These data will not do.' Maybe there were too many missing fields, maybe the sample I ended up with was not in any way representative, or maybe there were so many outliers that the data simply made no sense. Whatever the reason, you need to be honest and be ready to say the data are not up to the task.
Assuming the data have legs, it is time to do the analysis. This is the moment you have been waiting for! Your analysis plan should lay out for you what tests and comparisons to run.
But in addition to completing the analysis plan, the analyst has a couple more tasks here. There are a couple questions all good analysts are always prepared to answer for the requester. The analyst knows the data best and so has a responsibility to anticipate very reasonable questions requesters may ask. These include:
- What are the impact of outliers?
- How do the various fields relate to one another (are there interaction effects)?
- Is there anything unexpected (interesting, contradictory or not anticipated in the analysis plan)?
This is actually a pretty quick step compared to all of the others – it is time to put the finishing touches on the analysis, have someone else double check your conclusions, kick the tires a few times, and make sure it is reproducible and robust.
This is the step, as well, that the analyst will make decisions about visualizations and presentations. Once the analyst know what the main lessons are from the analysis, the visualization choices become clearer.
Before presenting an analysis to a requester, it is always a good idea to think through what did NOT get answered by the analysis and what additional questions surfaced for you. This preparation is very helpful because these are often questions and problems that are on the mind of the requester – or that the requester has the answers to even before they see the data. If you come having already thought through some of the questions, your discussion of the findings will be more productive and you will be able to identify follow up and clean up items much more quickly. And then you can be done with the project and move onto the next one.