your123 - stock.adobe.com

Agentic AI discharge summaries linked to safety, clinician wellbeing

A study found that agentic AI-generated summaries included in discharge notes had low potential for harm and were linked to lower clinician burnout rates.

New research published in JAMA Network Open found agentic AI models to be largely safe and effective when used to generate hospital course summaries. The AI summarization tool used in the study was also associated with reduced clinician burnout and modest time savings.

The findings were based on a 10-week prospective pilot at a Stanford Health Care inpatient unit in which 11 attending hospitalist physicians used an agentic AI workflow, powered by Gemini 2.5 Pro.

The custom agentic AI workflow developed for the study, known as MedAgentBrief, generated draft hospital course summaries nightly. The course summaries were securely emailed to physicians for review, and the physicians could choose whether to use them.

The system followed a multi-step process: the AI created an initial draft based on the patient's history, physical and latest note, iterated on that draft with chronological notes and cross-checks and added citations to reduce hallucinations. It produced a final summary that included a patient one-liner, a problem-based summary and a narrative hospital course overview, the study stated.

Over the 10-week period, the tool created 1,274 daily hospital course summaries and covered 384 patient discharges. Physicians chose to incorporate the AI-generated summaries into their final discharge documentation in 57% of the cases.

The physicians provided safety feedback on 100 summaries, helping researchers gauge the efficacy of the responses. About 88% of unedited summaries were rated as having "no harm potential," according to the physicians.

Just one summary was rated as likely to cause moderate harm, triggering an immediate adjudication by two independent investigators, who determined that the clinical directive posed no risk.

Omissions were the most specific error type flagged by physicians, noted in 25% of summaries. Inaccuracies were reported in 20% of summaries, and hallucinations appeared in just 2% of summaries.

Although omissions were frequent, physicians rated 88% of the summaries as having no potential for harm.

"Review of the free-text feedback revealed that omitted content typically fell into 3 categories: discharge plans for stable chronic conditions with little change from baseline, incomplete conveyance of diagnostic uncertainty or competing diagnoses, and insufficient emphasis on details that were mentioned but deserved more explicit attention," the study noted.

"These omissions, while clinically relevant for completeness, were identifiable during physician review and did not alter immediate management decisions."

In addition to observing safe, AI agent-generated discharge summaries, researchers found that physicians reported lower burnout rates. The mean Stanford Professional Fulfillment Index Work Exhaustion score -- designed to assess physician well-being -- decreased from 1.75 to 1.20 among physicians who used the tool, which researchers described as "clinically meaningful."

However, two of 10 physicians who responded to the survey had increases in burnout scores, and three of 10 experienced increases in cognitive load.

Notably, physicians' subjective assessments of efficiency were more positive than the actual data suggested. More than 65% of clinicians reported perceived time savings, and nearly a third estimated savings of greater than 15 minutes per summary. In reality, researchers observed no significant time savings, with physicians saving an average of 2.9 minutes per discharge summary. EHR closure time also remained largely unchanged.

"These convergent findings suggest that the primary benefit of generative AI tools lies in cognitive offloading rather than clock-time efficiency," the study noted.

"The AI serves as a scaffolding tool, providing a structured starting point that physicians review and refine rather than generate de novo. This shifts the value proposition from efficiency to sustainability, explaining why burnout improved when clock time did not."

The researchers acknowledged that the tool's high omission rates remain the biggest hurdle to success and will require continuous model training. Future work should also consider how AI-driven time savings could translate into higher demands on clinicians, effectively canceling out the observed reductions in burnout.

Overall, the study demonstrated the potential of AI in crafting reliable discharge summaries, while highlighting the need for further research.

"Technology assessments typically treat safety as secondary to efficacy. Given the pace at which commercial vendors are integrating LLMs into EHRs, often prioritizing speed over validation, evaluation of these tools to establish safety is urgently needed before demonstrating efficacy at scale," the study stated.

"Our results suggest that with appropriate guardrails, agentic LLM workflows for hospital course summarization may be deployed safely in clinical settings."

Jill Hughes has covered health tech news since 2021.

Dig Deeper on Clinical documentation