putilov_denis - stock.adobe.com

OIG found limited coordination, oversight of VA's genAI tools

The watchdog observed gaps in the coordination and oversight of generative AI chat tools used at the VA for clinical documentation and patient care, raising patient safety concerns.

The availability and clinical use of generative AI tools at the Department of Veterans Affairs is widespread, but its oversight of these tools is limited, the VA's Office of Inspector General observed in a new report.

What's more, formal coordination between the teams implementing these AI tools and those responsible for patient safety is lacking, the OIG asserted.

The OIG conducted a review of the VA's use of genAI chat tools from October 2025 through February 2026 and found that thousands of Veterans Health Administration staff actively engaged with the two general-purpose AI chat tools available to them: VA GPT and Microsoft 365 Copilot Chat.

"However, there is no means to measure their breadth of use for clinical care and documentation," the OIG stated.I

The report is a follow-up publication on the same subject that was sent to the VA in January, in which the OIG similarly warned that the VA's current use of generative AI chat tools for clinical care and documentation presents potential patient safety risks.

Now, the OIG is formalizing its recommendations to the VA, urging it to prioritize oversight of generative AI tools, implementation of AI safeguards and the integration of AI risk monitoring into existing patient safety programs. The VA's under secretary for health concurred with the recommendations.

Inconsistent classification of AI tools presents patient safety risks

The Office of Management and Budget's 2025 memorandum, "Accelerating Federal Use of AI through Innovation, Governance, and Public Trust," requires agencies to identify high-impact uses of AI and implement risk management protocols accordingly.

According to the OMB, AI use is considered high-impact when it serves as a "principal basis for decisions or actions with legal, material, binding or significant effect" on human health and safety. OMB's examples include patient diagnosis, risk assessment and treatment  as examples of high-impact healthcare uses.

Designating a tool as high-impact would trigger certain safety requirements, such as pre-deployment testing and human oversight. However, the VA did not designate VA GPT or Copilot Chat as high-impact. Instead, during interviews, the VA emphasized "user-level responsibility."

"The OIG found that VA AI leaders equated use of AI chat tools to using a search engine," the OIG noted.

"However, the OIG determined that this analogy is flawed. Unlike a search engine that finds links to websites, generative AI tools like VA GPT and Microsoft Copilot synthesize and transform sources to produce novel content, like drafting patient visit medical record entries from a transcript of a patient visit."

The OIG's analysis of user-generated prompts revealed that VA employees frequently used these applications for clinical purposes. Of the OIG's analysis of 135 prompts shared by VA staff, 79 were clinical in nature, including 56 prompts for drafting clinical notes, 17 for patient care summarization and 6 for other clinical purposes.

The OIG stressed that general-use AI chat tools can introduce opportunities for error and risk patient safety, given these tools' tendencies to hallucinate. As such, they should be subject to stricter guardrails, rather than being treated like a search engine.

The OIG supported its argument by citing industry best practices from HHS, The Joint Commission and the Coalition for Health AI about AI safety, all of which emphasized the importance of establishing formal AI governance structures, conducting ongoing quality monitoring and providing training for healthcare staff.

In addition to failing to designate these tools as high-impact, the OIG found that the VA's AI leaders had not effectively coordinated with the VHA's National Center for Patient Safety. Only one meeting between the two teams was reported. As a result of the OIG's preliminary report, the two teams planned additional meetings and collaboration opportunities.

In contrast to the risky rollout of VA GPT and Copilot Chat, OIG's report showed that the VA's pilot of an ambient AI scribe, "a targeted clinical documentation tool with functionality similar to clinical documentation prompts VA staff used with the AI chat tools," was designated as high-impact and had triggered safety requirements like feedback loops and the ability to detect patterns and errors.

However, it noted that VA employees are using tools like VA GPT and Copilot Chat for similar use cases. As such, they should be bound to the same strict protocols.

Following its review, the OIG recommended that the VHA's Under Secretary for Health review the VHA's current use of gen AI tools, define permissible clinical uses for general-purpose AI chat tools, evaluate whether safeguards applied to other high-impact tools (such as the ambient AI scribe) should be adapted for generative AI tools and integrate AI risk monitoring into existing patient safety programs.

VA's response

The OIG's preliminary report resulted in several actions by the VA to improve AI safety. For example, VA AI leaders added representation from the National Center for Patient Safety to the VHA AI Assessment Subcommittee. The groups plan to increase representation for patient safety and collaborate on the creation of a standardized definition of AI patient safety errors.

Additionally, the VA met with the Defense Health Agency to update the Joint Patient Safety Reporting system, enabling it to report and track AI-related events.

In addition to the actions the VA has already taken, the Under Secretary for Health concurred with the OIG's recommendation and provided a target date for completion of April 2027.

"The OIG will monitor implementation and focus its oversight efforts on the effectiveness and efficiencies of programs and services that improve the health and welfare of veterans and their families," the report stated.

Jill Hughes has covered health tech news since 2021. Her coverage areas include cybersecurity, HIPAA compliance, interoperability, AI and EHRs.

Dig Deeper on Artificial intelligence in healthcare