Taming the beast, part 1: Data validity and reliability

July 16, 2016

When my first child started to eat real food, I often took the opportunity during his naps to wash the kitchen floor. One particular day, I saw some food splattered on the molding next to his highchair, so I got a cloth and washed it off. Then I realized there was food on the wall at knee height, so I washed that, then at the chair rail level, washed that, and then I stepped back and saw food splattered all the way up to the ceiling.

Two gongs went off in my head – 1. my tiny baby was a wild freaking beast, and 2. it was my job to turn him into an actual human who could eat in front of other humans neatly, without my help, over and over and over.

Such is the analyst confronting raw data. Opening up a data set that has not had its validity and reliability established is like cleaning up after a wild gang of babies eating spaghetti.

Not to be overly dramatic, but it kinda is. You have no idea where the splatters are, how much they will stain, and whether you can will ever get the data to be polite and stop throwing food.

In this post, we are going to review what it means for data to be valid and reliable and why it is essential that they are. The next post will focus on what an analyst does to get them there. We offer hope for many of the most beastly data sources.

What are validity and reliability?

Validity and reliability are terms we use every day in a lot of different ways, but in analytics they have very narrow and technical meanings.

Validity: A measurement, or data field, is valid when it actually means what we think it means. So a field called 'time of admission' is valid when it means the time a child entered the hospital. If you know anything about Epic data, you know there are many admission time stamps and many of them are hours or even days apart. The one we use to mark admission for length of stay and readmission calculations needs to actually mark the time and day the child physically moves into the room for it to be valid.

There are a number of different kinds of validity, but this basic type (content validity) is really the only one relevant for most of the work we do at the hospital.

Reliability: A measurement, or data field, is reliable when it means the same thing whenever it is used. So the validated discharge time stamp is reliable if that field indicates the time every child physically leaves the hospital, regardless of day of the week, unit of discharge, or age of the child. If it means one thing on the PSYC unit and another in the NICU, it is not reliable.

Like validity, there are some important different kinds of reliability. But the consistency aspect of reliability is the most important feature for our purposes.

Why validity and reliable are essential

Imagine doing a trend analysis where the length of stay is defined differently each month. You could not possibly tell whether changes over time were due to actual changes in length of stay or changes in the way it is measured. Or imagine processing payments when charges are computed one way for one patient and another way for another patient. Without reliable charge algorithms, we could not run the hospital for long at all.

In the analytics world, there is a subfield devoted to creating data definitions that assure our hospital data are valid and reliable – it is called Master Data Management (MDM) and it fits under the field of Data Governance.

A great example of MDM is the process that our Health Information Management team goes through to assure that patients do not have duplicate medical record numbers and that each patient who appears new to our system is matched against all patients to make sure they really are new. This crucial work assures we avoid medical errors AND is the foundation for every patient-level analysis we do.

One of the outgrowths of the hospital's Sustainable Savings Initiative was the opportunity for our leaders and experts to come to agreement on what many, many key metrics should mean, and then develop valid and reliable algorithms for them. It is not like we had no metrics before, but the process gave us a way to get agreement across units and divisions and to launch a new era of data governance – we are well on our way to establishing 'a single source of truth' for interpreting Epic data. And that means we can make more intelligent decisions about everything we do.

There is no short cut to this very difficult work. There are just good parents working steadily to turn their beautiful little beasts into lovely, well-mannered grown-ups. And, when it comes to our hospital's data, that would be DAR's job.

Subscribe to The Why Axis

Subscribe now to have updates from The Why Axis delivered to your inbox.

Please leave a comment

Comments will be moderated.