your123 -


Using ChatGPT as a SAST tool to find coding errors

ChatGPT is lauded for its ability to generate code for developers, raising questions about the security of that code and the tool's ability to test code security.

Generative AI spans multiple use cases, including code generation from simple prompts. Previous analysis has helped explain how GenAI creates and analyzes source code for software projects. Some products, such as Microsoft Copilot, sit "with" the developer and offer code suggestions based on what the developer is writing.

This phenomenon has led to many questions regarding the security of cowritten programs. GenAI writing potentially vulnerable code is a major security issue, but AI has long touted its ability to self-learn. Could we mitigate the issue of writing vulnerable software if we teach the program how to secure its own code?

An approach to this learning is to use static application security testing as a model. SAST tools analyze an application's source code, binary or bytecode to identify vulnerabilities. They look for common coding errors that could result in SQL injection, command injection and server-side injection attacks, among others.

Editor's note: Modern SAST tools offer binary and bytecode analysis, in addition to source code analysis. However, we do not investigate binary or bytecode analysis here.

Using ChatGPT to analyze source code

First, we have to determine if AI can find vulnerabilities in its own code. Using known-bad code from StackHawk, let's see if AI spots any issues.

Screenshot of code with a SQL injection vulnerability

This is one of the simplest forms of SQL injection. The direct insertion of the value is a weakness susceptible to exploitation. Most security professionals and software developers should pick this weakness out quickly.

To test ChatGPT, provide the code snippet, and ask, "Is the snippet secure?"

Screenshot of ChatGPT finding a SQL injection vulnerability

ChatGPT correctly identified the weakness and offered new code with better security and error handling.

While this is the correct result, the provided code is simple, and the answer is well known.

Let's try another example from Mitre using known-bad C code.

Screenshot of known-bad code from Mitre

This code is slightly more complex because it has three potential weaknesses: buffer overflow, unchecked return value and null pointer dereference.

Provide the code, and ask ChatGPT the same question: "Is this code secure?"

Screenshot of ChatGPT finding vulnerabilities in Mitre's known-bad code

ChatGPT correctly acknowledged the issues with the code and offered updated source code. But is the code provided by ChatGPT secure? Let's ask ChatGPT.

Screenshot of ChatGPT finding vulnerabilities in code it supplied

ChatGPT responded with additional code to update the previously supplied code. By updating its own code, ChatGPT showed that it doesn't necessarily catch everything on the first pass.

Continually feeding the supplied code back to ChatGPT with the same question leads to more "errors" found. The third go-around, for example, identified that gethostbyaddr() is considered obsolete. Why didn't ChatGPT report that in the first place?

More worrying, how many times does the code need questioning before it's "secure" code? There's no answer to this question because each developer and organization has a different definition of secure code.

So, is ChatGPT a viable SAST tool?

These are some impressive results for a cursory review of using ChatGPT as a SAST tool. ChatGPT identified security vulnerabilities in code and provided modified code that eliminated the vulnerabilities.

However, the Ouroboros problem remains: Does continually feeding results as input into the model produce better results? Results vary. ChatGPT has also been known to make up answers when faced with questions that don't have well-documented answers.

Importantly, security practitioners and developers must validate any errors and results supplied by ChatGPT and check source code through a human lens.

Using ChatGPT to do a cursory look at code for errors probably yields 80% correct results. The last 20% is up to you and your risk thresholds. It is a good tool to increase efficiency, but much more testing is needed for ChatGPT to replace current SAST tools.

Matthew Smith is a virtual CISO and management consultant specializing in cybersecurity risk management and AI.

Dig Deeper on Application and platform security

Enterprise Desktop
Cloud Computing