Rasi Bhadramani/istock via Getty

Racial bias evident in psychiatric recommendations by LLMs

New research shows psychiatric treatment recommendations by LLMs are prone to racial bias, highlighting the importance of AI oversight to ensure health equity.

Treatment recommendations for psychiatric patients generated by large language models, or LLMs, display a pattern of racial bias, with Black patients receiving different treatment recommendations than their peers, according to a new study in NPJ Digital Medicine.

Led by Cedars-Sinai researchers, the study indicates an urgent need to curb healthcare disparities stemming from AI utilization. The study comes as investment in AI, particularly generative AI and LLMs, soars within the healthcare and life sciences industries.

The researchers examined racial bias in psychiatric diagnosis and treatment across four large language models: Claude, ChatGPT, Gemini and NewMes-15. They presented 10 psychiatric patient cases representing five diagnoses to the models under three conditions:

  • The 'neutral' condition, in which no reference to the patient's race was made.
  • The 'implicit' condition, which included implicit race details in the form of a patient name linked in population research studies with the Black population.
  • The 'explicit' condition, in which the patient's race was explicitly stated as Black and included the same patient name as the implicit condition. 

Clinical and social psychologists evaluated the models' diagnostic recommendations and treatment plans.

While diagnostic assessments showed relative consistency across different LLMs and cases, treatment recommendations showed more pronounced bias. The research shows that the LLMs frequently proposed different treatment approaches when racial characteristics were present, either explicitly or implicitly. 

For example, Gemini demonstrated increased focus on reducing alcohol use in anxiety cases only when the patient's race was explicitly stated as Black, and both ChatGPT and NewMes omitted medication recommendations for an ADHD case when racial characteristics were explicitly stated. However, they made the medication suggestions when those racial characteristics were not stated.  

Further, Claude suggested guardianship for depression cases with explicit conditions, but not in the neutral or implicit conditions.

"Most of the LLMs exhibited some form of bias when dealing with African American patients, at times making dramatically different recommendations for the same psychiatric illness and otherwise identical patient," said study author Elias Aboujaoude, MD, director of the Program in Internet, Health and Society in the Department of Biomedical Sciences at Cedars-Sinai, in a press release. "This bias was most evident in cases of schizophrenia and anxiety."

The majority of models received bias scores of 2.0 or above for treatment recommendations in schizophrenia and anxiety cases. 

Among the LLMs, the NewMes-15 model showed the highest susceptibility to bias, while Gemini demonstrated the lowest overall bias scores across conditions.

"Our findings serve as a call to action for stakeholders across the healthcare AI ecosystem to help ensure that these technologies enhance health equity rather than reproduce or exacerbate existing inequities," the researchers concluded. "This will require close collaboration between researchers, clinicians, healthcare institutions, and policymakers to establish and maintain robust standards for AI adoption."

Anuja Vaidya has covered the healthcare industry since 2012. She currently covers the virtual healthcare landscape, including telehealth, remote patient monitoring and digital therapeutics. 

Dig Deeper on Artificial intelligence in healthcare