Healthcare Data’s Garbage-In, Garbage-Out Challenge

Apr 11

Part #2 - Why is the GIGO Problem So Relevant to the Healthcare Community?

Continuing WCI’s series detailing the issues and providing solutions to the key medical data challenge of the 2020s: Avoiding Garbage In-Garbage Out.

In this ongoing series, West Coast Informatics will be examining the issue, its impact on the medical community, and providing approaches to minimize and ultimately avoid the issues. The questions we will answer in this series include:

Part #1: What exactly is the GIGO problem?
Part #2: Why is the GIGO problem so relevant to the healthcare community? (current article)
Part #3: What are the implications if this issue isn’t handled properly? (May 2023)
Part #4: What immediate steps can be taken to mitigate the problem?
Part #5: What are the challenges in addressing historical data?
Part #6: What can we learn from outside the medical community?

Recap

Garbage-In Garbage-Out is the notion that if you are making decisions based on data, even the best decision-making processes may lead to faulty outcomes if the data used isn’t reliable.

In the previous part of the series “What exactly is the GIGO problem?”, we detailed why faulty outcomes result from garbage data (thus the Garbage-In-Garbage-Out acronym). In this part, we will focus on why the GIGO problem is so prevalent in the healthcare data community.

Quality Data is Critical to Positive Patient Outcomes

What is the most critical component for a doctor to provide a diagnosis and treatment plan? Quality Data. Without it, patients are at risk. Lack of quality data happens far more than people realize. This may even happen at the most basic level of a patient reviewing patient notes following a visit to a clinic. This is detailed in research conducted by Bell, Delbanco, Elmore, et all as detailed in their JAMA published article “Frequency and Types of Patient-Reported Errors in Electronic Health Record Ambulatory Care Notes”. In this article, the researchers highlight that in their survey study of 136 815 patients, 29 656 provided a response, and 1 in 5 patients who read a note reported finding a mistake and 40% perceived the mistake as serious. This is a staggering figure! For a more detailed example as listed in the article, one can look at their summary of the Pathology Reporting issues.

Patients identified that some practitioners reported the wrong test result in the note and others who were not aware that more recent results or reports existed. For example, “The provider put the wrong CD4 cell count in my chart. She states 399, however lab results show my CD4 at 219.” Patients also reported mistakes in radiology results or practitioner summaries of radiology reports that made it difficult to determine whether there was clinical improvement or deterioration. In 1 example, different units of measurement resulted in ambiguity: “MRI [magnetic resonance imaging] reads dimensions now 5 × 3 cm compared to prior 5 × 4 mm.” Another commented, “Pathology report summary stated I had 2 positive lymph nodes but [the] detailed report stated 3 positive lymph nodes. That changes the staging and the treatment options. My physician had only read the summary and didn't realize I had 3 positive lymph nodes.” In this case, the patient reportedly notified the physician of the additional positive node, thereby changing the treatment plan because of the wider spread of the cancer. Patients also reported errors of omission, such as a missed lesion in the liver.

When it comes to healthcare informatics, there is significant opportunity to mistakenly rely upon bad data. This is mainly due to historical circumstances which we will briefly review. We will also review the state of today’s healthcare data before reviewing practical ways this could lead to bad outcomes.

How Did We Get Here?

Prior to the 2010s, paper was used almost exclusively in the healthcare setting. Your healthcare data was stored in folders across all the various points of care. For one provider to get access to this data you would need to have it physically or “electronically” (faxed) delivered. That and the mistakes that can arise from handwritten notes that may or may not be properly interpreted when shared across clinics is exactly the issue that EHRs and the underlying healthcare terminology are attempting to resolve.

With the HITECH act in 2009, Meaningful Use legislation “softly” mandated healthcare organizations to utilize EHRs instead of relying upon paper. The ensuing decade focused on transitioning to terminology usage in health care to ensure data quality within institutions as a first critical step. Only once institutions are reliably utilizing informatics can the ultimate vision be reached. Unfortunately, it was permissible to have different representations for same data. This might even be found within an institution with silos of data that aren’t normalized.

Today’s State of Affairs

In the current decade, the healthcare industry’s usage of informatics is building upon the advances of the previous decade. The goals have advanced such that simply being able to represent the data accurately within an institution isn’t enough. Rather, the challenge in this decade is to ensure data sharing is done accurately. The community needs to reach the state where there is confidence that all institutions exchanging data are relying upon data that is both highly accurate and precise.

In today’s healthcare setting, data is shared routinely across institutions. For example, laboratory tests will use an HL7 message with data represented using LOINC and SNOMED to communicate between a lab and the patient’s clinician. Reimbursement will be handled based on CPT and/or ICD10 codes. The healthcare provider will store the information using multiple terminologies. If further follow-up appointments are needed, the patient’s information will be transferred to potentially another institution including those terminologies.

Likewise, any organization that receives healthcare data must review that data to ensure that it complies with expectations. While a time-consuming task, it is far better to identify inconsistencies prior to receiving data from the source organization.

Assuming the data is correct when created, it may need to go through a series of processes before arriving to its final destination (be it an EHR, a Health Information Exchange, and Laboratory Information System, etc). Any one of those processes has the potential of altering the data such that once it is arrives in that EHR, it may not mean what the data generator intended.

Summary

We have just reviewed the historical reasons the GIGO problem exists in the healthcare community. We have also highlighted how the introduction of each sub-process (be it in-house or provided from external partners) provides the opportunity for data corruption no matter how accurate the original data is. Yet in the end, there must be confidence that the parties you are exchanging data with are encoding the healthcare information precisely as well. This puts an extra burden on institutions to avoid the GIGO problem.

It may seem easy to dismiss the issue for the time being and assume that it will be resolved eventually by the healthcare community at large. Yet, with each passing year though, more and more data is being generated that must later be “corrected”. Furthermore, the data being generated today IS being relied upon today. Thus, the GIGO problem IS impacting today’s patients and today’s population health research.

In the next section, we will review how caustic the bad data is to the individual patient, the care givers, and the healthcare researchers. In the final section of this GIGO series, we will discuss best-practices the broader tech community has adopted to minimize and resolve garbage data.

Next Topic: What are the implications if this issue isn’t handled properly? (May 2023)

Brian Carlsen