Using ChatGPT as a SAST tool to find coding errors
ChatGPT is lauded for its ability to generate code for developers, raising questions about the security of that code and the tool's ability to test code security.
Generative AI spans multiple use cases, including code generation from simple prompts. Previous analysis has helped explain how GenAI creates and analyzes source code for software projects. Some products, such as Microsoft Copilot, sit "with" the developer and offer code suggestions based on what the developer is writing.
This phenomenon has led to many questions regarding the security of cowritten programs. GenAI writing potentially vulnerable code is a major security issue, but AI has long touted its ability to self-learn. Could we mitigate the issue of writing vulnerable software if we teach the program how to secure its own code?
An approach to this learning is to use static application security testing as a model. SAST tools analyze an application's source code, binary or bytecode to identify vulnerabilities. They look for common coding errors that could result in SQL injection, command injection and server-side injection attacks, among others.
Editor's note: Modern SAST tools offer binary and bytecode analysis, in addition to source code analysis. However, we do not investigate binary or bytecode analysis here.
Using ChatGPT to analyze source code
First, we have to determine if AI can find vulnerabilities in its own code. Using known-bad code from StackHawk, let's see if AI spots any issues.
This is one of the simplest forms of SQL injection. The direct insertion of the value data.id is a weakness susceptible to exploitation. Most security professionals and software developers should pick this weakness out quickly.
To test ChatGPT, provide the code snippet, and ask, "Is the snippet secure?"
ChatGPT correctly identified the weakness and offered new code with better security and error handling.
While this is the correct result, the provided code is simple, and the answer is well known.
Let's try another example from Mitre using known-bad C code.
This code is slightly more complex because it has three potential weaknesses: buffer overflow, unchecked return value and null pointer dereference.
Provide the code, and ask ChatGPT the same question: "Is this code secure?"
ChatGPT correctly acknowledged the issues with the code and offered updated source code. But is the code provided by ChatGPT secure? Let's ask ChatGPT.
ChatGPT responded with additional code to update the previously supplied code. By updating its own code, ChatGPT showed that it doesn't necessarily catch everything on the first pass.
Continually feeding the supplied code back to ChatGPT with the same question leads to more "errors" found. The third go-around, for example, identified that gethostbyaddr() is considered obsolete. Why didn't ChatGPT report that in the first place?
More worrying, how many times does the code need questioning before it's "secure" code? There's no answer to this question because each developer and organization has a different definition of secure code.
So, is ChatGPT a viable SAST tool?
These are some impressive results for a cursory review of using ChatGPT as a SAST tool. ChatGPT identified security vulnerabilities in code and provided modified code that eliminated the vulnerabilities.
However, the Ouroboros problem remains: Does continually feeding results as input into the model produce better results? Results vary. ChatGPT has also been known to make up answers when faced with questions that don't have well-documented answers.
Importantly, security practitioners and developers must validate any errors and results supplied by ChatGPT and check source code through a human lens.
Using ChatGPT to do a cursory look at code for errors probably yields 80% correct results. The last 20% is up to you and your risk thresholds. It is a good tool to increase efficiency, but much more testing is needed for ChatGPT to replace current SAST tools.
Matthew Smith is a virtual CISO and management consultant specializing in cybersecurity risk management and AI.