everythingpossible - Fotolia

Machine learning methods in EHR show promise, with limits

Carnegie Mellon University's Jeremy Weiss is on a quest to improve health predictions with machine learning and data from electronic health records.

Jeremy Weiss, assistant professor of health informatics at Carnegie Mellon University's Heinz College, is after a simple goal: Improve healthcare and health research. He's using machine learning methods to get there.

Weiss, an M.D.-Ph.D. with a background in computer science, is developing new machine learning algorithms and running them against electronic health record data from the Marshfield Clinic in Wisconsin, health care provider and insurer UPMC in Pittsburgh, as well as Veterans Affairs and MIMIC, a public set of deidentified data from about 40,000 critical care patients. He's currently focused on improving risk scores, which are used to determine a patient's level of risk, for diabetes complications and sepsis.

Here, Weiss talks about why new machine learning algorithms are needed for the kind of research he's doing and why irregular data collection poses a significant challenge for machine learning methods.

Editor's note: This Q&A has been edited for clarity and brevity.

What data do you look at to determine a risk score?

Jeremy Weiss: There's this divide. Classically, you would want to choose a minimal set of features or covariates to include in risk models. Some approaches that I've taken follow that line of thinking. Others use this concept of, well, we have big data from electronic health records, and we can use that data for improving prediction.

Jeremy Weiss, assistant professor of health informatics, Heinz College, Carnegie Mellon UniversityJeremy Weiss

I think the answer is, yes, that's true, and we've observed that, repeatedly -- big data does improve predictive performance. But it comes with all kinds of caveats at the same time.

Like what?

Weiss: In particular, you capture large numbers of features. Instead of having tens of features, you're talking about thousands to millions of features. Oftentimes, those features are collected in irregular ways. The ability for your predictive model to generalize may go down if the features collected where you want to deploy the model were not collected in a procedurally identical way.

There's also the question of cost of transferring information for a risk score with 10 features or a risk score with a million features. If you want to deploy the algorithm with a million features, you have to go collect a whole bunch of those, or then you're going to have a substantial amount of missing data, which basically means the predictive performance is likely to be reduced, because you're kind of guessing at many of those features.

Why do you have to develop new algorithms to do this kind of work? What's missing from algorithms that already exist?

Weiss: Machine learning methods have increased flexibility to model effects in ways that classical algorithms have not been able to do. They're able to easily integrate other information sources, and that can lead to predictive boosts. The methods being developed are useful for predictive tasks, so long as you're willing to focus on the population you're training on, because these machine learning methods tend to have less generalizability compared to the classical methods.

Can you give an example of where machine learning methods outperform classical methods?

Weiss: One use case for machine learning is risk stratification. Researchers will conduct risk stratification analyses because they want to intervene on some subset of the population. There's a Durham Diabetes Coalition, which took a risk score or risk stratification and said we're going to intervene on the high-risk group much differently than the medium- and low-risk groups. How did they get the risk stratification? They trained the model -- in this case, it was a classical model of logistic regression.

What we're finding with machine learning is that we can do better, because we have the flexibility to improve prediction. If you intend to use the prediction for risk stratification -- I just want to know who is at high risk and who is medium and low -- it would behoove you to use a machine learning method, because you're going to intervene on that population for which you trained.

You said machine learning algorithms are better at integrating other data sources. What kind of data sources are you referring to?

Weiss: There are all sorts of information sources, but, classically, we would think about images and text ... kind of like social determinants of health data, monitoring, device information, things like that. There's a lot of opportunity, but it becomes challenging because, oftentimes, there's nonuniform collection of that data. You can get good prediction in a target population, but because of lack of standardization, it becomes challenging to generalize and integrate into clinical care practices.

You've described electronic health record data as messy. Why is that such a challenge?

Weiss: The data is going to be irregularly collected, unlike the collection mechanism you would see in a prospective cohort study. That leads to all sorts of potential biases in your data. Basically, it's very difficult to tease apart whether it was the collection that's giving you the boost in prediction of a particular variable or if it's the actual value of said variable that's leading to your boost in prediction. If it's the fact that it was being collected and the collections were irregular, they could be irregular for any number of reasons. ... Those irregularities lead to difficulties in trust and generalizability of algorithms. That's a big limitation.

What you'll also see is that, if you read a clinical guideline, you'll see there may be hundreds of factors that are written either about the likelihood or progression of disease or the development of complications. But in an electronic health record, those things will not be uniformly measured. Not everyone will have every measurement, because those measurements are not indicated. It's not indicated for you to have a CT scan if you don't have some pathology that requires it.

So, again, you get this problem of, in those people who have the CT scan, they are going to be at a different level of risk because somebody knew that they were at that higher risk.

How do you go about removing bias?

Weiss: We validate the algorithms. Machine learning algorithms tend to validate better internally than externally -- every algorithm tends to do that. But I think the degradation we typically see for machine learning algorithms is steeper when we go to external validation. That's where it's important to clarify the use case of the machine learning prediction. When you're doing risk stratification and when you're doing it for intervention internally or locally, that's going to be where it really excels.

What other challenges have you experienced that still need work?

Weiss: Access to data can be challenging. Regulations are in place for good reason -- to protect patient data. But that means that they also can create silos of data where the data formats may look different and things that are collected, the elements that are collected, and the way that the procedures are conducted vary -- and the ways they vary are not documented.

That leads to this generalization challenge. ... But that's also linked to this idea that, when you want to bring new technologies into the healthcare space, if they're going to be irregularly collected, that's going to make it more difficult for the adoption of those methods. If we have mechanisms to link data to new technology in a uniform way, that's going to be really helpful.

Next Steps

Big data, machine learning and electronic health records

Dig Deeper on Electronic health record systems

Cloud Computing
Mobile Computing