putilov_denis - stock.adobe.com

Feature

How AI-driven patching could transform cybersecurity

At RSAC 2024, a Google researcher described how the search giant has already seen modest but significant success using generative AI to patch vulnerabilities.

Alissa Irei, Senior Site Editor

Published: 17 May 2024

Unpatched software vulnerabilities have long been a chronic cybersecurity pain point, leading to costly data breaches every year. On average, a data breach resulting from the exploitation of a known vulnerability costs $4.17 million, according to IBM's "Cost of a Data Breach Report 2023."

The problem: Organizations don't patch software flaws as quickly as threat actors find and exploit them. Once a critical vulnerability is published, malicious scanning activity begins in a median time of five days, according to Verizon's "2024 Data Breach Investigations Report." On the other hand, two months after fixes for critical vulnerabilities become available, nearly half of them remain unremediated.

A potential solution: Generative AI. Some cybersecurity experts believe GenAI can help close that gap by not just finding bugs, but also fixing them. In internal experiments, Google's large language model (LLM) has already achieved modest but significant success, remediating 15% of simple software bugs it targeted.

In a presentation at RSA Conference (RSAC) 2024, Elie Bursztein, cybersecurity technical and research lead at Google DeepMind, said his team is actively testing various AI security use cases, ranging from phishing prevention to incident response. But the ability to use Google's LLM to secure its codebase by finding and patching vulnerabilities -- and, ultimately, reducing or eliminating the number of vulnerabilities that require patching -- tops their AI security wish list.

"It's the big one, and I think the one we are most excited for," Bursztein said.

Google's AI-driven patching experiment

In a recent experiment, Bursztein's team compiled 1,000 simple vulnerabilities from within the Google codebase, discovered by sanitizers in C/C++.

They then asked a Gemini-based AI model -- similar to Google's publicly available Gemini Pro -- to generate and test patches and identify the best ones for human review. In a technical report, researchers Jan Nowakowski and Jan Keller said the experiment's prompts followed this general structure:

You are a Senior Software Engineer tasked with fixing sanitizer errors. Please fix them.

... code

// Please fix the <error_type> error originating here.

... LOC pointed to by the stack trace

... code

Engineers reviewed the AI-generated patches -- an effort Bursztein described as significant and time-consuming -- ultimately approving 15% and adding them to Google's codebase.

"Instead of a software engineer spending an average of two hours to create each of these commits, the necessary patches are now automatically created in seconds," Nowakowski and Keller wrote.

And, given the thousands of bugs discovered each year, they noted, automatically finding fixes for even a small percentage could add up to months of engineering time and effort saved.

The model shows an understanding of code and coding principles that is quite impressive.

Elie BurszteinCybersecurity technical and research lead, Google DeepMind

AI-driven patching wins

In his RSAC presentation, Bursztein said the results of the AI patching experiment suggest Google researchers are on the right track. "The model shows an understanding of code and coding principles that is quite impressive," he said.

In one instance, for example, the LLM correctly identified and fixed a race condition by adding a mutex.

"Understanding the concept that you have a race condition is not trivial," Bursztein said, adding that the model was also able to fix some data leaks by removing pointer use. "So, in a way, it is almost doing the writing."

Benefits of AI-driven patch management

Possible benefits of AI-driven patch management include the following:

Faster vulnerability discovery. In a separate experiment, researchers found Google's AI model can successfully improve the performance of fuzzers -- programs that automatically test software for vulnerabilities by injecting unexpected data.

"When the fuzzer is stuck, you can ask the model to help it out and find more bugs," Bursztein said. "There is a natural symbiosis between fuzzing and LLMs."
Reduced manual burden. In the near term, GenAI technology seems poised to lessen the manual burden of patch management by assisting human operators in finding and fixing bugs.
Elimination of vulnerabilities in production. If the capabilities of Google's LLM continue to grow as the company hopes, it could eventually eliminate vulnerability windows entirely by detecting coding flaws and offering fixes at commit time -- a scenario Bursztein referred to as "the holy grail."

"Ideally, we could stop introducing bugs into our codebase," he said. "That would make the world a much safer place if we could get there as a community."

AI-driven patching challenges

Although the results of the AI patching experiment were promising, Bursztein cautioned that the technology is far from where Google hopes to one day see it -- reliably and autonomously fixing 90%-95% of bugs. "We have a very long way to go," he said.

The experiment underscored the following significant challenges:

Complexity. The AI seemed better at fixing some types of bugs than others -- often those with fewer lines, researchers found.
Validation. The validation process for AI-suggested fixes -- in which human operators make sure patches address the vulnerabilities in question without breaking anything in production -- remains complex and requires manual intervention.
Data set creation and model training. In one instance of problematic behavior, according to Bursztein, the AI commented out to get rid of a bug -- but also got rid of the code in the process. "Problem solved!" Bursztein said. "Besides being funny, this shows you how hard it's going to be."

To train the AI out of this behavior requires data sets with thousands of benchmarks, he added, each assessing both whether a vulnerability is fixed and whether program features are kept intact. Creating these, Bursztein predicted, will be a challenge for the cybersecurity community at large.

These difficulties notwithstanding, he remains optimistic that AI might one day autonomously drive bug discovery and patch management, shrinking vulnerability windows until they all but disappear.

"How we get there is going to be interesting," Bursztein said. "But the upsides are massive, so I hope we do get there."

Alissa Irei is senior site editor of TechTarget Security.

How AI-driven patching could transform cybersecurity

At RSAC 2024, a Google researcher described how the search giant has already seen modest but significant success using generative AI to patch vulnerabilities.

Google's AI-driven patching experiment

AI-driven patching wins

Benefits of AI-driven patch management

AI-driven patching challenges

Dig Deeper on Application and platform security

July Patch Tuesday brings over 130 new flaws to address

Microsoft’s April 2025 bumper Patch Tuesday corrects 124 bugs

Microsoft’s February 2025 Patch Tuesday corrects 57 bugs, three critical

Unpatched.ai: Who runs the vulnerability discovery platform?