Making the Most of Your AI Investment Means Prioritizing the Basics of Data Quality

By Brian Laberge, Solution Engineer, Wolters Kluwer Health
LinkedIn: Brian Laberge
LinkedIn: Wolters Kluwer

AI applications are exploding across healthcare, both for consumers and providers. With big-name AI models just announcing new health-focused applications for their bots, there is a greater urgency to ensure that data is ready for these platforms to achieve accuracy and maximize impact. As organizations continue to turn to AI to drive healthcare innovation, they must take the proper steps in implementation to avoid building an AI model that creates more obstacles and confusion when it’s designed to simply add value. This starts with how these models are trained.

While it may be shocking to say, the same thing plaguing the successful adoption of AI today is what hindered the integration of digital systems for the last few decades in healthcare: poor data quality. Messy data can confuse AI models, build inaccuracies into the programs, and make the process more difficult overall. Not only is a lot of healthcare data unstructured, but many of the structured data can have data quality issues that negatively impacts the models being built. Gartner estimates that poor data quality costs organizations an average of $12.9 million annually.

Transforming data, that is to say standardizing it, organizing it, and regularly maintaining it in the face of industry updates, before implementing AI will help organizations to better ensure the program works for their intended purpose, and that the investment in innovation can make a true impact. Further, standardized, quality data will result in stronger clinician trust in the models, which without it could face resistance in implementation.

Data’s impact on AI implementation

To make healthcare data “analytics ready,” data scientists spend an overwhelming 50-80% of their time on manual data cleaning and preparation. While this is indeed a significant investment of time, putting this work at the front end of the project ensures successful implementation to avoid problems after the technology is already being used. Additionally, missing, incomplete or incorrect information can lead AI platforms to infer incorrect assumptions that are then built into the core of the model.

Low quality, overly simplified or minimal data for rural populations could cause a bias to be built into the model. Triaging the data as a first step to better understand its contents helps find the missing gaps in data to avoid incorrect information or biases being permanently built into the platform.

As an example, I’ve seen a scenario where the lab value came back incorrect, reporting levels of calcium in a patient that would have meant they weren’t living. While a clinician would be able to easily identify the mistake, the AI model started treating these results as the primary issue, rather than the unrelated, real symptoms the patient had come in to address.

Addressing these biases allows for the AI to add the intended transformational effect for the healthcare setting rather than serving as an additional area for misinformation and miscommunication.

Ensuring Your Data is AI-Ready

There are a few areas to pay extra attention to in ensuring data is ready for large language models. Lab or medical data often comes back with portions incomplete, inaccurate, or lacking validity, which skews the data from showing an AI model the full picture. Engineers can manually complete the data, remove clinical verbiage used by doctors and decipher clinical notes, and validate codes to an industry standard in order to ensure data is complete, accurate and validated. Alternatively, terminology servers can help assess and validate this data more efficiently, saving internally teams hours that can be reinvested in qualitative analyses or insights.

Data governance is not just an IT issue, it’s a business challenge. A strong data governance process helps to ensure data accuracy and completeness before feeding it into an AI model. Not all data needs to be managed with the same rigor, but any data used to drive critical decisions must be carefully governed. Having a concrete process to keep code definitions up to date ensures that models being fed additional context are built with consistent information to return predictable, accurate results. This proactive approach ensures that you can address issues as they arise and maintain high-quality data throughout its lifecycle.

Siloed data, whether across systems or teams, needs to be brought together to create a single source of truth. This comprehensive data can then serve as the foundation that ultimately uses good data to power better health.

The team responsible for training the AI should be continuously assessing and ensuring an understanding of data. Maintaining consistent, complete quality data is important in building the AI for your needs. Organizations must standardize the data to avoid human error being built into the AI. The technology has the potential to serve as a source to avoid human error, cleaning the data will help the AI be as powerful and useful as possible.

Making the Most of Your AI Investment

The benefits of preparing data for AI implementation are proven. Ultimately, improved data quality leads to better health outcomes, and without it, AI can pose more of a risk than reward in healthcare and create distrust with clinicians. Whether you’re streamlining access to insurance claims to derive insights on treatment effectiveness, aligning patient records to understand how care was delivered over time, or driving for more precision in healthcare interventions, high-quality data is the foundation for success.

Investing in clean data isn’t just about improving operational efficiency; it’s about enabling better outcomes. AI models are no different and cleaning the data serves to make the technology work in our favor. Messy data can hinder the innovation that AI promises, leaving organizations with more problems and a wasted investment. By understanding, governing, and measuring your data more effectively, you can unlock AI’s full potential, delivering impact for your organization and the patients you serve.