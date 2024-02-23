GitHub Copilot may be creating unintended security issues for customers, according to new research from Snyk.

In a blog post Thursday, Snyk explained that generative AI-powered coding assistants like GitHub Copilot, which use large-language models to suggest code completions to development teams, have a limited understanding of software and merely imitate learned patterns based on the training data.

"It's important to recognize that generative AI coding assistants, like Copilot, don't understand code semantics and, as a result, cannot judge it," Randall Degges, Snyk's head of developer relations & community, wrote in the blog post. "Essentially, the tool mimics code that it previously saw during its training."

As a result, these tools, which have become increasingly popular among developers, can reproduce security issues from customers' existing codebases and open source projects. While coding assistants can be enormously helpful to developers in terms of saving time and increasing productivity, Degges said they also carry significant risk.

"Put simply, when Copilot suggests code, it may inadvertently replicate existing security vulnerabilities and bad practices present in the neighbor files," he wrote. "This can lead to insecure coding practices and open the door to a range of security vulnerabilities."

The blog post described examples in which Snyk researchers used GitHub Copilot's neighboring tabs feature to access files from Snyk's integrated development environment (IDE). They asked GitHub Copilot to create SQL queries that match the user input, and the responses included good code.

The researchers then introduced a vulnerable snippet of code into a neighboring tab, creating a new SQL query in the project. When they ran the same request again, GitHub Copilot's response replicated the vulnerable code. "We've just gone from one SQL injection in our project to two, because Copilot has used our vulnerable code as context to learn from," Degges wrote.

GitHub Copilot is billed a tool that can "improve code quality and security," according to the company's website. GitHub launched a new vulnerability filtering system for the AI coding assistant last week, which is designed to make the tool's code suggestions more secure.

Degges noted in the blog post that the more secure a customer's existing codebase is, the less likely GitHub Copilot is to produce code suggestions with vulnerabilities. However, the tool can amplify existing security debt within a customer's codebase, making them even less secure.

Snyk urged development teams to conduct manual reviews of the code generated by tools like GitHub Copilot, and to have STAT guardrails and policies in place identify and fix any issues.

In an interview with TechTarget Editorial, Degges emphasized that GitHub Copilot is a valuable tool and one that he himself uses. However, context is extremely important for secure software development, and urged organizations to apply reviews and safeguards for AI-generated code because the tools lack that context. "AI coding assistants are amazing, but they have the exact same problems normal human developers have."

Degges also said in his experience, most developers probably aren't aware that AI coding assistants can easily replicate existing security issues from users' codebases and open-source projects.

"The truth is that large-language models today and the AI explosion of the last year and a half are built around generative AI, and in those scenarios, the responses are based on a statistical model," he said. "In this case, there's no underlying knowledge of the actual code. It's all based on probability."

TechTarget Editorial contacted GitHub for comment but the company had not responded at press time.