Machine Learning and CDS Transparency

William HymanWilliam A. Hyman
Professor Emeritus, Biomedical Engineering
Texas A&M University, w-hyman@tamu.edu
Read other articles by this author

One of the many questions in the design and use of Clinical Decision Support software is whether or not the user can recreate the logic used by the system in reaching its conclusions and recommendations–or alerts, or suggestions. If the CDS is based on sound medical logic, perhaps supported by specific reference material, then the user could in principle reach the same conclusions by reading the same literature, or perhaps reach a different conclusion. This transparency was part of the proposed criteria for some CDS systems not falling under FDA regulation in 2015 federal draft legislation—which didn’t pass. The FDA has otherwise not been forthcoming on the general subject of CDS despite many pleas for guidance, and a draft guidance in this domain is an as yet unfulfilled part of the 2015 strategic plan.

However underlying logic and science is not the only way to build “artificial intelligence” (AI), which might in some instances turn out to be artificial mediocrity if not artificial stupidity. A second way to build an AI system is “machine learning”. In machine learning system building software is given a large data set from real patients typically consisting of a finite number of input conditions and associated patient outcomes. Note here that software and human intervention is required to make this happen. As is always the case the computer can’t just do this on its own. Using the available data set the software seeks to discern patterns that suggest that certain combinations of inputs are associated with certain outcomes. This becomes the basis for making outcome predictions based on what has been seen before. The resulting system can then be tested with new patient data that also includes inputs and outcomes but which is not part of the data that was used to train the system. The test then is to see if the system can now make predictions on the new input data that correspond to the actual outcomes associated with that data. If it can to some degree of assurance, then the system is considered ready to use on real patients where the inputs are known but the outcomes are not. In general, this is a form of pattern recognition in which repetition is used to “learn” correlations. Machine learning can be updated from time to time based on growing data sets. This may include capturing accuracy of predictions made during clinical use. How and when this occurs can be a complex issue. For example, is this a local or a universal process? If local, this can result in a CDS that is resident in one location producing different results from what was once the same CDS operating at a different location. This might be good in that it could reflect local but probably ill-defined conditions. But it might be bad if the continuing education is based on a weak data set. It might also be simply curious to have a product that evolves differently at different sites.

There are many issues in machine learning that are all associated with how good its prediction capability really is for what population. But there is also the question of whether the clinical user can recreate the thought process that went into a prediction. In general, the answer for machine learning is that the user cannot do this. There is no underlying logic that the user can review. The user also would be hard pressed to learn anything from looking at all the data that was used to teach the system. Even if that data was available the user is not likely to discern the patterns it contains, that is why it took a computer to do it in the first place. An irony here is that the user might be able to do it if the data set is small and the answers are easy. However, if this were the case then the CDS might not be of any value.

Thus the user has a patient and a CDS generated result but no basis for second guessing that result other than perhaps their own experience. But if their own experience was reliable in this regard they wouldn’t need the CDS. What then is the user actually supposed to do with the result? The system developers will say, at least in part for liability reasons, that the user is not to rely on the results but should instead use their own judgment. This logic, if successful, opens the door to a universal anti-liability warning:” Do Not Rely On This Product”. But is this the expected real world behavior of people accessing a CDS? Is this how it is marketed?

One might be reminded here of recent news about autonomous driving in which “smart” software drives the car but, so far, the driver must in theory always be ready to take control. At least in many instances this taking of control is a fantasy. The driver will to varying degrees not be paying attention, or full attention, and will not be fast enough to effectively take control if something unexpected happens. Thus real world drivers will not be following the possibly disingenuous admonition to always be ready to take control and make their own judgement. But perhaps improving software will help as in the recent announcement of an update of a certain driving system that will incorporate the breakthrough concept of being able to detect large metal objects blocking the road.