Closing the Loops on Data Science and Informatics

Bill Hersh1William Hersh, MD, Professor and Chair, OHSU
Blog: Informatics Professor
Twitter: @williamhersh

One of the most highly viewed posts of this blog is a 2015 posting, What is the Difference (If Any) Between Informatics and Data Science. One critique I have had of data science is the focus of most work on only showing prediction and not implementing prescription. In other words, how do we take the predictive output in an ever-increasing number of areas of biomedicine and turn it into programs that actually improve outcomes, whether better patient care, improved healthcare delivery, or more effective research? Some recent publications bring this issue to light and show that we have some loops to close before we attain the value of data science in biomedicine.

A couple of recent perspective pieces bring this closure into light. One is from colleagues Philip Payne, Elmer Bernstam, and Justin Starren [1]. In a Perspective last year in JAMIA Open, they put forth a model that delineates the loop that must be closed, from the development of data science (and informatics) models and systems to the real-world informatics that most who work in the field are familiar with of implementing and evaluating systems with real users and organizations. A more recent paper from Lenert et al. notes that as predictive models are put into place and impact outcomes, they will necessarily impact those models, which will need to be adjusted to the new reality of their use [2].

One aspect of this first loop to be closed is how we study data science and machine learning interventions in actual clinical practice. A pair of recently published papers demonstrate how models and systems can be built and validated, and then assessed in the clinical real world. A first paper by Barton et al. develops and evaluates a model for predicting sepsis from patient vital designs [3]. Sepsis is a medical problem of continued significance while vital sign data is readily available. A subsequent paper by Shimabukuro et al. implements a randomized controlled trial in two medical intensive care units, finding a decrease in length of stay in the units from 13.0 to 10.3 days and a 12.4% reduction in in-hospital mortality [4].

Another recent study assessed the application of machine learning to detecting colonic polyps during colonoscopy [5]. While the machine learning system worked effectively, it was mostly effective at recognizing polyps that were unlikely to progress to cancer quickly, such as small adenomas and hyperplastic polyps. Nonetheless, recognizing such polyps improves the overall quality of colonoscopy exam.

A second loop that will need to be closed to achieve the vision of widespread generalized application of data science will be the generation of standardized EHR data for use across the healthcare system. A group of colleagues and I wrote about this in 2013 [6], as have many others, but some recent work documents aspects of this problem are still not solved. Two recent analyses show variations in how physicians [7] and healthcare organizations [8] document patient care, which may lead to variation in data that is not due to underlying differences in patients.

The need to close these loops show we are still in the early days of machine learning and predictive algorithms. While their impact in medicine will likely be enormous in the long run, there is still much work that will need to be done to optimize their data and how they are most effectively used.

For all References, see original post.

This article post first appeared on The Informatics Professor. Dr. Hersh is a frequent contributing expert to HITECH Answers.