How to prevent SQL injection with prepared statements
One of the top defenses against SQL injection is prepared statements. In this book excerpt, learn what prepared statements are and how to extend their defense.
Structured Query Language, or SQL, is a programming language used with databases. SQL injection attacks -- when malicious SQL statements are inserted into an input query to gain access to a database -- have long been challenging for security teams. Though fairly easy to prevent, SQL injections regularly make the OWASP Top 10 Web Application Security Risks list.
One way to prevent SQL injection is with prepared statements, also known as parameterized queries.
"The typical prepared statement example is replacing strong concatenation with something like, 'Select * from employee where name = ?'" said Roman Zabicki, author of Practical Security: Simple Practices for Defending Your Systems. Because prepared statements are meant to be used this way, he added, they are bulletproof. It's when you get into other query fields you run into problems with SQL injection.
"Not every part of a SQL statement is parameterizable," Zabicki said. "If you want to order by, say, employee start date, order is typically not parameterizable."
In the following excerpt from Practical Security, Zabicki further explains what prepared statements are, how to use prepared statements with fields that cannot be parameterized and why a layered defense is the best prevention against SQL injection attacks.
Check out this Q&A where Zabicki discusses why patching is the most important task for anyone who wears a security hat within an organization to learn.
Download a PDF of Chapter 2 to learn about other injection attacks and how to defend against them.
Preventing SQL Injection with Prepared Statements
Prepared statements make up the core of our defense against SQL injection. These are sometimes referred to as parameterized queries. For our purposes, we'll use the terms interchangeably. Prepared statements enforce the separation between templated SQL and user-supplied input. Instead of building up a SQL statement by concatenating strings and user-supplied input, prepared statements are constructed by using a parameterized SQL statement that has placeholder tokens (in most SQL dialects, this placeholder is a ?) and a list of the values that should be used for those parameters. The important difference with prepared statements in our vulnerable example above is that prepared statements never concatenate the values and the SQL. The separation is always maintained. Let's see an example in Java. As before, the concept is the same regardless of which language it's written in.
With a function like this, even if the attacker tries to use % signs to escape out, they can't because the attacker-controlled wildcard parameter isn't concatenated into the SQL expression. Instead, it's passed to a call to setString, and the database will keep it separated.
When reading code and looking for SQL injection, keep in mind that concatenation can look different in different languages. The examples above used +, but string interpolation can also open the door to SQL injection when it's used with user-supplied data, as in the following example in Python.
Correct use of prepared statements should be the preferred way to prevent SQL injection. It's possible to misuse a prepared statement and undo the protection it can bring, however. Suppose we defined journalEntrySearch as follows:
We can see that even though we're creating a prepared statement, we're using an attacker-controlled piece of data, wildcard, to construct the SQL for the prepared statement. This undoes the protection we hoped to gain. Hopefully a mistake like this would be caught before making it into production. Static analysis tools can be used to catch this kind of mistake during development.
Extending the Defense Beyond Prepared Statements
Prepared statements are great because they're nearly bulletproof. The downside is that not every part of a SQL statement can be parameterized. Table names, for instance, cannot be parameterized. There's no way to write a prepared statement like this:
In our journal-keeping example, parameterizing the table name might sound a little silly. There are cases, however, where this level of flexibility would be useful. Suppose our journaling website takes off and we add support for blog posts, mass emails, and on-demand printing of birthday cards. We may find ourselves duplicating the search logic across tables for journal entries, blog posts, mass emails, and birthday cards. (Yes, there are ways to get rid of the duplication, but this is a security book, not a database book, so please indulge me.) If you find yourself in a situation where you can't protect yourself with prepared statements and concatenation is the only way to build the query you want, you'll need to check that the data you're concatenating is safe. One way to achieve this is to introduce a level of indirection so that the attacker picks an ID that corresponds to one option in a list of options but the attacker doesn't get to provide the table name itself.
Let's see this approach put to use in a slightly contrived example.
Our database has grown, and now we have BlogPost, MassEmail, and BirthdayCard tables in addition to the original JournalEntry table. All of them have a Body column that we want to search on. We want the user to be able to pick which table to search against using a drop-down list that is generated using a select tag in the HTML of our web page. It might look like this:
If you need a refresher on HTML, the value is the literal text that the browser will send to the server if that option is selected. It's surrounded by double quotes in this case. The part between the > and the </option> is what's displayed in the browser. A browser might render this drop-down like this:
One way to make sure that the user-supplied data is legitimate is to maintain a mapping of IDs to table names on the server. This mapping would be used to generate a slightly different drop-down than what we showed before. Instead of having the browser send the server the table name to put into the SQL statement, the browser will send the ID of the table name to put into the SQL statement. This would be done with HTML similar to the following:
And a server-side mapping of IDs to table names similar to this:
This mapping could be maintained in a dedicated table, or it could be generated dynamically at start-up time and cached in memory. However the mapping is maintained; the server expects input that can be parsed as an integer, not a table name. So when the server parses this it will be readily apparent if it's not valid (either not a number or not a number that maps to anything.) Another benefit of this approach is that table names aren't exposed to the attacker. Obscurity does not provide security, but there's no need to shout our table structures from the rooftops, either. One final benefit to this approach is that any attempt by the attacker to try sending other values will stand out. If the server gets any value for table that's not one of the integers from 1 to 4, the server can log that and alert support staff. There's no reason that legitimate users going through the GUI would ever send any value other than 1, 2, 3, or 4. So if the server gets any other value, there is chicanery afoot. We'll see this pattern repeated throughout the book. First priority is to prevent an attack; second priority is to make it "noisy" for an attacker to probe our system.
Layering Additional Defenses as a Mitigation Against Future Mistakes
Proper use of prepared statements is our primary defense against SQL injection. Prepared statements are great, but we have to remember to use them every time we write code that touches SQL; we're never "done" with applying this defense. And if we're building complex, dynamic SQL statements with user input in parts of the SQL that aren't parameterizable, we need to exercise a great deal of caution in many places in the codebase. If we're sloppy in just one of those places, we can wind up leaving the door open to future SQL injection. It would be great if we could complete a one-time task that would protect us throughout future development. Unfortunately, we don't have anything quite that powerful, but proper use of database permissions can get us part of the way there. In theory, we could have a single database user for each table that we want to work with. In practice, this is unlikely to be effective except in very small applications. There are likely to be a large number of tables in an application. And some interactions involve using multiple tables in a single statement. If the number of tables doesn't get you, the number of combinations of tables will.
While it isn't worthwhile to introduce a dedicated database account for every table, it can be worthwhile to introduce them for particularly sensitive tables, such as audit tables or tables that contain passwords. It would be unfortunate if SQL injection in some random part of your application allowed an attacker to change admin passwords or cover their tracks by deleting audit records.
Putting It All Together for a Robust Defense
Adding database permissions to widespread use of stored procedures leaves us with a layered defense that can serve as a model for how we want to defend other parts of our system. We start by defending as much as we can with a nearly bulletproof defense like prepared statements. We then expand the scope of our defense with ongoing diligent development. Finally, we minimize the impact of development mistakes with the one-time application of a broadly effective defense like database permissions. We also set up our system so that attacks will be noisy.
Noisiness here means that attempts to carry out these attacks can be made to stand out. When we build alerting into our system, we can't allow many false positives because that won't scale, will burn out employees, and will lower urgency around responding to alerts. The alerts we've discussed should never happen under well-meaning use of the system, so if we detect these attacks, we have a high-quality indication that an attack is underway. With built-in alerting, the system can notify support staff and possibly take defensive action, such as locking accounts.
This defense requires a lot of ongoing diligence during development. The problem is that diligence is scarce. So if we can't easily increase the amount of diligence we'll be able to bring to bear, let's try to minimize the number of places where we need to use diligence. It's a good idea to introduce some kind of shared framework code to minimize the number of places where diligence is required. Make it easy for application developers to do the right thing and make it clear which parts of the code should access the database and which shouldn't. Don't overlook the importance of examples. Future developers who haven't joined your team yet will draw heavily on the code they've inherited when they write code. Make it easy for them to find good examples in your codebase.
We started the chapter with an explanation of software vulnerabilities by way of a knock-knock joke. Now that we've taken a good look at SQL injection, let's reward ourselves with a software vulnerability joke that's actually funny. Check out Bobby Tables by Randall Munroe.