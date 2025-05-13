In a few short years, generative AI, or GenAI, has gained an inextricable foothold in the healthcare industry. GenAI tools and models have become essential to care delivery, with use cases crossing the care continuum. However, even as GenAI's popularity soars, researchers are sounding the alarm on its pitfalls, such as the significant risks posed by bias.

All AI models have the potential to display bias, as they are trained on data that could be flawed. For example, datasets often underrepresent patients based on social class, race, gender, religion, sexual orientation or disabilities, making the AI models trained on them inherently biased.

A new study shows that GenAI tools are not immune to this issue. Published in Nature Medicine, the study revealed that GenAI models may recommend different treatments for the same medical condition based solely on a patient's sociodemographic background, which could result in health inequities.

These findings have significant implications for health-focused GenAI developers and the healthcare provider organizations seeking to use them.

A LOOK INTO THE STUDY AND ITS FINDINGS For the study, researchers from Mount Sinai Health System examined large language models (LLMs), which are common types of GenAI tools. "These models have been trained on extensive portions of the internet, exposing them to countless sentences and diverse contexts," explained the study's co-senior author, Eyal Klang, M.D., in an interview. "This training allows LLMs to interpret nuances in human language and generate original text that is coherent, logical and convincingly human-like." Klang, who is the chief of generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai, added that the research team wanted to highlight areas where GenAI may be failing and needs adjustment to ensure safe patient care. The researchers evaluated nine LLMs across 1.7 million model-generated outputs from 1,000 emergency department cases. Of the ED data, 500 were real patient cases and 500 were synthetic vignettes, and each presented with variations reflecting 31 different sociodemographic identities and a control. We decided to undertake this study precisely because these models perform so effectively and convincingly. It's clear they will increasingly be integrated into healthcare systems, directly interacting with patients and influencing clinical decisions. Given their growing role, we felt it was critical to systematically investigate their behavior. Eyal Klang, M.D.Chief of generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai They found sociodemographic biases in all the models' clinical recommendations. For instance, patients labeled as high income were disproportionately recommended advanced diagnostic tests more often than patients labeled as low- and middle-income, who were often recommended basic or no further testing. Further, patients labeled as Black, unhoused or identifying as LGBTQIA+ were more frequently directed toward urgent care, invasive interventions or mental health evaluations. "These biases were consistent across both proprietary and open-source models," Klang said. Klang further noted that the researchers were surprised by the magnitude of certain biases uncovered through the analysis, especially regarding mental health assessments. The study shows that some patients labeled as being from LGBTQIA+ subgroups received recommendations for mental health assessments approximately six to seven times more often than clinically indicated.