Rifqyhsn Design/istock via Getty

Understanding GenAI's sociodemographic bias in healthcare

Amid GenAI's growing popularity in healthcare, a new study highlights the sociodemographic-based biases of the models, urging clinical oversight.

In a few short years, generative AI, or GenAI, has gained an inextricable foothold in the healthcare industry. GenAI tools and models have become essential to care delivery, with use cases crossing the care continuum. However, even as GenAI's popularity soars, researchers are sounding the alarm on its pitfalls, such as the significant risks posed by bias.

All AI models have the potential to display bias, as they are trained on data that could be flawed. For example, datasets often underrepresent patients based on social class, race, gender, religion, sexual orientation or disabilities, making the AI models trained on them inherently biased.

A new study shows that GenAI tools are not immune to this issue. Published in Nature Medicine, the study revealed that GenAI models may recommend different treatments for the same medical condition based solely on a patient's sociodemographic background, which could result in health inequities.

These findings have significant implications for health-focused GenAI developers and the healthcare provider organizations seeking to use them.

A LOOK INTO THE STUDY AND ITS FINDINGS

For the study, researchers from Mount Sinai Health System examined large language models (LLMs), which are common types of GenAI tools.

"These models have been trained on extensive portions of the internet, exposing them to countless sentences and diverse contexts," explained the study's co-senior author, Eyal Klang, M.D., in an interview. "This training allows LLMs to interpret nuances in human language and generate original text that is coherent, logical and convincingly human-like."

Klang, who is the chief of generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai, added that the research team wanted to highlight areas where GenAI may be failing and needs adjustment to ensure safe patient care.

The researchers evaluated nine LLMs across 1.7 million model-generated outputs from 1,000 emergency department cases. Of the ED data, 500 were real patient cases and 500 were synthetic vignettes, and each presented with variations reflecting 31 different sociodemographic identities and a control.

We decided to undertake this study precisely because these models perform so effectively and convincingly. It's clear they will increasingly be integrated into healthcare systems, directly interacting with patients and influencing clinical decisions. Given their growing role, we felt it was critical to systematically investigate their behavior.
Eyal Klang, M.D.Chief of generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai

They found sociodemographic biases in all the models' clinical recommendations. For instance, patients labeled as high income were disproportionately recommended advanced diagnostic tests more often than patients labeled as low- and middle-income, who were often recommended basic or no further testing.

Further, patients labeled as Black, unhoused or identifying as LGBTQIA+ were more frequently directed toward urgent care, invasive interventions or mental health evaluations.

"These biases were consistent across both proprietary and open-source models," Klang said.

Klang further noted that the researchers were surprised by the magnitude of certain biases uncovered through the analysis, especially regarding mental health assessments.

The study shows that some patients labeled as being from LGBTQIA+ subgroups received recommendations for mental health assessments approximately six to seven times more often than clinically indicated.

TAKEAWAYS FOR HEALTH AI STAKEHOLDERS

The study's findings are noteworthy for developers and users of GenAI tools in healthcare.

Klang underscored that the scale and consistency of socioeconomic- and demographic-based biases across GenAI models highlight the importance of evaluating LLMs before they become routine in clinical practice.

GenAI use is gaining steam in healthcare, with the tools increasingly used in direct patient care. Klang said they are used to collect medical histories, offer clinical decision support to clinicians, identify complications and alert medical teams. Additionally, the GenAI tools are applied to administrative tasks, helping streamline back-office functions by analyzing patient cohorts, summarizing medical records and automating clinical documentation.

A recent McKinsey & Company survey revealed that 85% of healthcare leaders were exploring or had already adopted generative AI capabilities in the fourth quarter (Q4) of 2024.

The survey polled 150 healthcare leaders from payers, health systems, healthcare services and technology groups in Q4 2024. Most survey respondents (75%) identified administrative efficiency and clinical productivity (74%) as the areas where GenAI has the greatest potential, followed by patient or member engagement (55%). 

"We decided to undertake this study precisely because these models perform so effectively and convincingly," said Klang. "It's clear they will increasingly be integrated into healthcare systems, directly interacting with patients and influencing clinical decisions. Given their growing role, we felt it was critical to systematically investigate their behavior."

With these tools becoming more widely used, GenAI biases could exacerbate existing healthcare disparities, such as poorer outcomes and reduced healthcare access for minority groups, rural Americans and low-income populations.

"Over time, these biases can deepen mistrust, worsen health inequities, misallocate critical resources, and ultimately undermine patient safety and outcomes," Klang said.

Thus, healthcare AI developers and providers must be aware that GenAI can produce biased recommendations influenced by patient demographics and identify, monitor and mitigate these biases, Klang said. Developing clinical oversight processes and allowing community feedback can help address GenAI biases and ensure patient safety.

In addition, researchers must continue to uncover AI biases as the technology advances rapidly and brings new challenges.

Klang and his research team plan to expand the range of clinical questions they test and, eventually, conduct prospective, real-world evaluations.

"By examining how these models perform over time in live clinical settings, we aim to better understand their reliability and ensure they're truly safe for patient care," he said.

Anuja Vaidya has covered the healthcare industry since 2012. She currently covers the virtual healthcare landscape, including telehealth, remote patient monitoring and digital therapeutics.

Dig Deeper on Artificial intelligence in healthcare