First study of ChatGPT Health questions triage efficacy
The study, which is the first to assess OpenAI's ChatGPT Health tool, found that the system under-triages serious medical emergencies and mental health crises.
The first preliminary assessments of ChatGPT Health's triage capabilities are in, and it's looking inconsistent at best.
According to a group of researchers from the Icahn School of Medicine at Mount Sinai, ChatGPT Health misses some of the highest-stakes referrals, indicating the tool might not be ready for consumer use, at least without safeguards and provider guidance.
"LLMs have become patients' first stop for medical advice -- but in 2026 they are least safe at the clinical extremes, where judgment separates missed emergencies from needless alarm," according to Isaac S. Kohane, M.D., Ph.D., the chair of the Department of Biomedical Informatics at Harvard Medical School and who was not involved with the research.
"When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high. Independent evaluation should be routine, not optional," Kohane said in a Mount Sinai press release.
ChatGPT Health was launched in January 2026 to much fanfare. OpenAI, the tech company that created the tool, said ChatGPT Health could help patients get medical advice in the context of their own circumstances, allowing users to upload their own digital health records.
Part of ChatGPT Health's capabilities includes triage, helping users determine whether their symptoms warrant medical attention and where they should go to get it.
Indeed, there's a gap in the market for this type of technology. Just days before launching ChatGPT Health, OpenAI said about a quarter of ChatGPT's 800 million regular users asks a healthcare-related question weekly. All said, about 40 million users leverage ChatGPT for healthcare purposes.
Those high utilization rates beg the question: is this technology safe and reliable?
"That gap motivated our study," lead author Ashwin Ramaswamy, M.D., instructor of Urology at the Icahn School of Medicine at Mount Sinai, said in the press release. "We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?"
According to the researcher's analysis of 960 mock medical scenarios, ChatGPT Health isn't reliable for accurate medical triage.
ChatGPT Health under-triages emergency cases
The Mount Sinai researchers tested ChatGPT Health using 60 clinician-authored vignettes spanning 21 clinical domains. Various queries had unique characteristics, like patient race, gender and social determinants of health.
Researchers also repeated queries, adjusting for whether patients had barriers to care (such as limited transportation access) or were downplaying their symptoms.
ChatGPT Health performed inconsistently at best, the researchers said, with the biggest failure rates occurring at the extreme ends of the clinical spectrum.
For example, there was a 35% failure rate for non-urgent presentations and a 48% failure rate for emergency conditions. Said otherwise, ChatGPT under-triaged emergency cases 48% of the time.
Within that emergency category, there was still variation. For example, ChatGPT Health was more likely to correctly triage "textbook" emergencies like stroke or anaphylaxis. For less common but still serious emergencies like diabetic ketoacidosis or impending respiratory failure, the tool failed to correctly triage 52% of cases.
This inconsistency is likely due to ChatGPT Health's inability to assess more nuanced medical situations, according to Ramaswamy.
"ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions," Ramaswamy noted. "But it struggled in more nuanced situations where the danger is not immediately obvious, and those are often the cases where clinical judgment matters most. In one asthma scenario, for example, the system identified early warning signs of respiratory failure in its explanation but still advised waiting rather than seeking emergency treatment."
Certain factors weighed on ChatGPT Health's recommendations more than others. For example, social determinants of health such as race, gender and access to care did not typically sway the chatbot's responses.
However, factors like users who minimize their own symptoms -- or who reference family or friends who do -- were more likely to get suggestions to access less urgent care.
ChatGPT Health doesn't always refer to 988 crisis hotline
In addition to inconsistent triage of certain medical emergencies, the researchers found that ChatGPT Health did not always refer users displaying suicidal ideation to the 988 crisis hotline, despite the system being programmed to do so.
Specifically, the system was less likely to refer users to the 988 mental health crisis hotline when users gave a specific plan for how they would self-harm compared to when users did not provide specific details.
"The crisis guardrail finding may be the most consequential failure mode exhibited in the entire study," the authors wrote in the report's discussion. "The capability to recognize mental health crises and connect users with crisis resources is a basic prerequisite for any consumer health platform. Our data show this prerequisite has not been reliably met."
Clinicians must advise consumer AI use
While the study results indicate that consumer-facing tools like ChatGPT Health are not reliable, the authors acknowledge that it'd be impractical to simply advise patients not to use them.
Rather, the authors asserted that providers need to guide patients in how these tools can be used and outline the risks and benefits of them. After all, 73% of patients are still relying primarily on their doctors for health information, compared to just 16% who are using AI.
"As a medical student training at a time when AI health tools are already in the hands of millions, I see them as technologies we must learn to integrate thoughtfully into care rather than substitutes for clinical judgment," Alvira Tyagi, a first-year medical student at the Icahn School of Medicine at Mount Sinai and second author of the study, said in the press release.
"These systems are changing quickly, so part of our training now must consider learning how to understand their outputs critically, identify where they fall short, and use them in ways that protect patients."
Importantly, ChatGPT Health and other consumer-facing AI tools are just in their infancy. As time progresses, the technologies will likely improve. Still, it will be incumbent upon providers to determine best practices for understanding how their patients are using AI and guiding them in safely accessing the tools.
Sara Heath has reported new related to patient engagement and health equity since 2015.