tippapatt - stock.adobe.com

AI scribe note quality under question as adoption grows

Ambient AI scribes may save time, but new research raises concerns about note quality and the potential to exacerbate care gaps.

Ambient AI scribes are quickly gaining traction across healthcare, promising to reduce documentation burden and allow clinicians to focus more on patients. But new evidence suggests the technology may come with trade-offs in quality.

In a recent vendor-neutral study, researchers found that AI-generated clinical notes scored lower than clinician-written notes across multiple measures of documentation quality, including thoroughness, organization and clinical usefulness.

"That saved time has now been replaced with the time that's required to review," said Rebecca Andrews, M.D., a practicing physician at UConn Health, and former chair of the American College of Physicians' board of regents, in response to the study results.

The findings, published in JAMA Network Open by researchers from the University of Virginia and other institutions, point to broader operational questions around AI scribe adoption. Andrews was not involved in the research but reviewed the study from the perspective of a practicing physician and healthcare policy leader. The results raise questions about whether AI scribes reduce clinician burden or create new oversight demands.

Studying evidence gaps in AI scribe use 

Unlike many earlier evaluations centered on workflow and clinician satisfaction, the new study focused specifically on AI scribe documentation quality and tested the tools under more realistic clinical conditions.

Researchers evaluated notes generated by 11 AI scribe tools using standardized primary care scenarios. Human clinicians and AI tools created notes from the same audio-recorded visits. Blinded raters assessed them using a modified version of the Physician Documentation Quality Instrument, or PDQI-9, which measures factors such as accuracy, completeness, organization and usefulness.

The scenarios included challenges such as background noise, muffled speech due to face masks and non-native English speakers to evaluate how the tools performed outside ideal settings. Researchers said the study was designed to address unresolved concerns around documentation accuracy, completeness and patient safety as AI scribe adoption accelerates.

Risks linked to subtle documentation errors

The study found that AI tools performed less consistently than human clinicians when generating notes in scenarios involving background noise, face masks and non-native English speakers. For Andrews, this raises concerns that the technology may not work equally well for all patients, especially those already at risk for care gaps due to language barriers.

"So now you have some disparities in care that you're exacerbating by [using] a technology that's only applicable to certain patients," she said. 

Andrews also noted that the issue with AI-generated notes is often not obvious factual errors, but rather subtle inaccuracies and incorrect inferences that can be difficult to catch during routine review.

"There are words that are difficult to differentiate -- 'do' and 'don't' are very similar. That can totally change the meaning," Andrews said.

She also described cases where AI systems inferred incorrect information from the conversation.

"My patient had said, 'Well, I guess I could try to eat more meat for protein.' And it inappropriately extracted that to mean that she was vegetarian, which she is not," she said.

In addition, the study found gaps in the thoroughness and usefulness of AI-generated notes, suggesting the issue goes beyond simple transcription errors. AI scribes rely on what is said aloud, while human clinicians also use nonverbal cues.

"There are things that we don't say out loud -- the body language, the hesitation, the things we pick up on as clinicians that aren't necessarily verbalized -- and the scribe will leave out," Andrews said.

Overcoming operational challenges

As health systems continue adopting AI scribes, the findings raise questions about how the technology should fit into clinical workflows. The study authors recommended that AI-generated notes be treated as draft documentation requiring clinician review rather than replacements for clinician-authored notes.

But building workflows that allow adequate review time, while essential to maintaining accuracy, can be challenging. AI scribes are often marketed as time-saving tools; however, Andrews said the expectation of greater efficiency may lead to an increased workload, leaving clinicians with less protected time to safely review AI-generated notes.

Because clinicians remain responsible for the final medical record, Andrews said safe AI scribe use may depend less on the technology itself and more on the workflows, oversight and governance surrounding it. Provider organizations and health systems will need more formal processes governing clinician training, patient consent and ongoing monitoring of note quality.

"I think safe use really involves having clear, documented consent with patients, knowing where, how much, how long the information is stored," Andrews said. She also called for "shared accountability" between clinicians and AI vendors for note accuracy and safety.

AI scribes are likely to remain a key part of clinical workflows as health systems continue searching for ways to reduce documentation burden. But this study's findings suggest that healthcare providers proceed carefully.

"I don't think we should be completely anti or completely pro [AI scribe]," Andrews said.

Instead, she cautions, health systems may need to focus less on whether AI scribes save time, and more on how to deploy them safely, equitably and without compromising documentation quality.

Elizabeth Stricker, BSN, RN, comes from a nursing and healthcare leadership background, and covers health technology and leadership trends for B2B audiences.

Dig Deeper on Clinical documentation