By Sarianne Gruber
Twitter: @subtleimpact
A statistician by training, Elizabeth Stuart, PhD is a professor of Mental Health, Biostatistics, and Health Policy and Management at Johns Hopkins Bloomberg School of Public Health, and in the age of Electronic Health Records (EHRs), her research on developing and applying methods to estimate causal effects is a hot button topic. Having recently read Dr. Stuarts’s article, Estimating Causal Effects in Observational Studies Using Electronic Health Data: Challenges and (Some) Solutions, she sheds light on the new dilemmas that many researchers are encountering with EHR data. The opportunity to answer questions with healthcare’s “Big Data” is infinite, but the non-randomized EHR data requires a different approach than the statistics used in a conventional clinical trial design.  At the International Conference on Health Policy Statistics, I had the chance to meet with Dr. Stuart and learn the differences between  EHR and clinical trial data and which statistical methods help to better estimate “causal effects” in non-experimental EHR data studies.  Here is a gently edited version of Dr. Elizabeth Stuart’s explanations and methodological approach.
Difference in Data Structure: The main structural difference between electronic medical record data and clinical trial data is the fact that EHR data is much messier.  In a randomized trial, you have a single point in time for people to be randomized to get a treatment or not.  Patients are followed for a well defined period of time and move forward in a trial with nicely defined outcomes and endpoints (e.g., 6 months or one year after the treatment). With EHR or claims data, we have lots of observations of patient data over time; however, the data collection is not controlled.  Observations occur when patients visit their providers or when they happen to have a procedure; unlike the nice clean time order and structure you have in a trial.  In a clinical trial, the whole benefit of randomization is that it is randomly decides which patients get the new treatment or new drug versus an older treatment or an older drug. Because of randomization, the potential of confounding factors is removed.  The difference in outcomes between the groups is due to the drug alone, since all differences are controlled a priori between the groups. That is the reason why randomized experiments are beneficial and give valid estimates of causal effects. In general, a drawback of EHRs and electronic health data is that we (generally) don’t have that control.  We haven’t been able to randomly assign people to get one treatment or another. Instead, we just observe a group of people who, for example, received one type of antidepressant and another type of group of people who received a different type of antidepressant.  We really don’t why that happened. Was it because the patients were sicker or had some underlying other health conditions that made the physicians (and the patients) more likely to choose one drug or the other, or was there some randomness?  Again, we weren’t able to control for this dynamic.   In the data analysis we then have to adjust for how people got the new drug versus the older drug, and that they might be systematically different from one another. Some of the methods that I work on called Propensity Scores help deal with this phenomenon.  Essentially, the scores try to make the groups with and without treatment look as similar to each other as possible on the observed characteristics such as age, pre-existing conditions, co-morbidities and hospitalization history.   One benefit of electronic medical records is it can give you a lot of information on a patient. You can do matching and equate the groups using a large set of characteristics and clinical measures available on the electronic medical record.
EHR Data Study Design:  Methods I work on could be used when a medical practice is interested in seeing if patients who are prescribed a particular antidepressant are doing better over time than patients prescribed another type of antidepressant. Another study example could be on patients exposed to a new case management program compared to those who didn’t participate in the case management program. How would a practice begin to construct a study using their EHR and claims data?  One way of thinking about EHR data studies is a general idea called “the design of non-experimental studies”.   I think is useful that we can still learn from and emulate randomized trials.  In a randomized trial, researchers are very thorough and careful in finding who is the population of interest, what are the eligibility criteria, what is the exposure of interest, what is the comparison of interest, when are the outcomes measured and what are the precise outcomes.   All of those steps are still just as important when doing a non-experimental comparison.  The first step is often to just be very clear and specific about: (1) what the research question is, (2) what the exposure of interest is (and the comparison condition), and (3) what the outcomes of interest are. Then the methods I work on, Propensity Score methods, can be used once this frame work is in place.  They can be used to help ensure that the people we are comparing are as similar to each other as possible.
Recommended reading  Estimating Causal Effects in Observational Studies Using Electronic Health Data: Challenges and (Some) Solutions, Elizabeth A. Stuart, PhD et al.