Artificial Intelligence in Medicine: 21st Century Resurgence

Bill Hersh1William Hersh, MD, Professor and Chair, OHSU
Blog: Informatics Professor
Twitter: @williamhersh

I first entered the informatics field in the late 1980s, at the tail end of the first era of artificial intelligence (AI) in medicine. Initial systems focused on making medical diagnoses using symbolic processing, which was appropriate for a time of relatively little digital data, both for individual patients and healthcare as whole, and underpowered hardware. Systems like MYCIN [1], INTERNIST-1/QMR [2], and DXPLAIN [3] provided relatively accurate diagnostic performance, but were slow and difficult to use. They also provided a single likely diagnosis, which was not really what clinicians needed. Because of these shortcomings, they never achieved significant real-world adoption, and their “Greek Oracle” style of approach was abandoned. [4]. There was also some early enthusiasm for neural networks around that time [5], although in retrospect those systems were hampered by lack of data and computing power.

Into the 1990s, informatics moved on to other areas, such as information retrieval (search) from the newly evolving World Wide Web and more focused (rule-based) decision support. At the start of the new century, I started to wonder whether I should still even cover those early AI systems in my well-known introductory informatics course. I kept them included, mainly out of a sense of historical perspective, since those systems were a major focus of work in the field in its early days. However, the term “AI” almost seemed to disappear from informatics jargon.

In recent years, however, AI in medicine (and beyond) has re-emerged. Driven by much larger quantities of data (through electronic health records, curated data sets – mainly images, and personal tracking devices) and much more powerful hardware (mainly networked clusters of low-cost computers and hard disks as well as mobile devices), there has been a resurgence of AI, although with a somewhat different focus from the original era. There has also been a maturing of machine learning techniques, most prominently neural networks applied in complex formats known as deep learning [6, 7].

The most success for use of deep learning has come in image processing. The well-known researcher and author Dr. Eric Topol keeps an ever-growing list of systems for diagnosis and their comparison with humans (to which I have contributed a few, and to which I add studies that have only been published as preprints on bioArXiv.org):

  • Radiology – diagnosis comparable to radiologists for pneumonia [8] tuberculosis [9], intracranial hemorrhage [10]
  • Dermatology – detecting skin cancer from images [11-13]
  • Ophthalmology – detecting diabetic retinopathy from fundal images [14-15], predicting cardiovascular risk factors from retinal fundus photographs [16]; diagnosis of congenital cataract [17], age-related macular degeneration [18], plus disease [19]; and diagnoses of retinal diseases [20] and macular diseases [21]
  • Pathology – classifying various forms of cancer from histopathology images [22-25], detecting lymph node metastases [26]
  • Cardiology – cardiac arrhythmia detection comparable to cardiologists [27] and classification of views in echocardiography [28]
  • Gastroenterology – endocytoscope images for diagnose-and-leave strategy for diminutive, nonneoplastic, rectosigmoid polyps [29]

Organized medicine has taken notice of AI. Journal of the American Medical Association recently published two perspective pieces [30, 31] as well as editorial [32] on how AI and machine learning will impact medical practice. I have heard anecdotally that some of the most heavily attended sessions at radiology meetings at those devoted to AI. I am sure there is a mixture of intellectual excitement tinged with some fear of future livelihood.

The success of these systems and the technology underlying them are exciting, but I also would tell any thoughtful radiologist (or pathologist, dermatologist, or ophthalmologist) not to fear for his or her livelihood. Yes these tools will change practice, maybe sooner than we realize. However, I always think that high-tech medicine of the future will look like how it is used the doctors of Star Trek. Yes, those physicians have immense technology at their disposal, not only for diagnosis but also for treatment. But those tools do not remove the human element of caring for people. Explaining to patients their disease process, describing the prognosis as we know it, and shared decision-making among the diagnostic and treatment options are all important in applying advanced technology is medicine.

I also recognize we have a ways to go before this technology truly changes medicine. For several years running, I have expressed both my intellectual excitement at predictive data science while also noting that prediction is not enough, and we must demonstrate that what is predicted must be demonstrated to be able to be applied to improve the delivery of care and patient health.

This notion is best elaborated by some discussion of another deep learning paper focused on a non-image domain, namely the prediction of in-hospital mortality, 30-day unplanned readmission, prolonged length of stay, and the entirety of a patient’s final diagnoses [33]. The paper demonstrates the value of deep learning, the application of Fast Healthcare Interoperability Resources (FHIR) for data points, and efforts for the neural network to explain itself along its processing path. I do not doubt the veracity of what the authors have accomplished. Clearly, deep learning techniques will play a significant role as described above. These methods scale with large quantities of data and will likely improve over time with even better algorithms and better data.

But taking off my computer science hat and replacing it with my informatics one, I have a couple of concerns. My first and major concern is whether this prediction can be turned into information that can improve patient outcomes. Just because we can predict mortality or prolonged length of stay, does that mean we can do anything about it? Second, while there is value to predicting across the entire population of patients, it would be interesting to focus in on patients we know are more likely to need closer attention. Can we focus in and intervene for those patients who matter?

Dr. Topol recently co-authored an accompanying editorial describing a study that adheres to the kind of methods that are truly needed to evaluate modern AI in clinical settings [34]. The study itself is to be commended; it actually tests an application of an AI system for detection of diabetic retinopathy in primary care settings [35]. The system worked effectively, though it was not flawless, and other issues common to real-world medicine emerged, such as some patients being non-imageable and others having different eye diseases. Nonetheless, I agree with Dr. Topol that this study sets the bar for how AI needs to be evaluated before its widespread adoption in routine clinical practice.

All of this AI in medicine research is impressive. But its advocates will need to continue the perhaps more mundane research of how we make this data actionable and actually act on it in ways that improve patient outcomes. I personally find that kind of research more interesting and exciting anyways.

This article post first appeared on The Informatics Professor. Dr. Hersh is a frequent contributing expert to HITECH Answers. Please see original article for all references.