Binary diffing is a reverse-engineering technique that involves comparing two versions of the same software to reveal recent code changes -- not unlike the spot-the-difference puzzles in Reader's Digest.
In ethical hacking, the goal of binary diffing is to flag new security patches as a means of locating and identifying corresponding vulnerabilities. Penetration testers and red teamers can then use this information to launch N-day exploits in unpatched systems, for example.
Although simple in theory, binary diffing is complex in practice. The following excerpt from Chapter 18, "Next-Generation Patch Exploitation," of Gray Hat Hacking: The Ethical Hacker's Handbook, Sixth Edition by authors Allen Harper, Ryan Linn, Stephen Sims, Michael Baucom, Daniel Fernandez, Huáscar Tejeda and Moses Frost, published by McGraw Hill, explains how to get started and introduces four binary diffing tools. Download a PDF of the entire chapter here.
And check out this Q&A, in which lead author Harper discusses why it's so important that gray hat hackers ethically disclose the vulnerabilities they discover and the devastating impact of unethical disclosures.
In response to the lucrative growth of vulnerability research, the interest level in the binary diffing of patched vulnerabilities continues to rise. Privately disclosed and internally discovered vulnerabilities typically offer limited technical details publicly. The more details released, the easier it is for others to locate the vulnerability. Without these details, patch diffing allows a researcher to quickly identify the code changes related to the mitigation of a vulnerability, which can sometimes lead to successful weaponization. The failure to patch quickly in many organizations presents a lucrative opportunity for offensive security practitioners.
Introduction to Binary Diffing
When changes are made to compiled code such as libraries, applications, and drivers, the delta between the patched and unpatched versions can offer an opportunity to discover vulnerabilities. At its most basic level, binary diffing is the process of identifying the differences between two versions of the same file, such as version 1.2 and 1.3. Arguably, the most common target of binary diffs are Microsoft patches; however, this can be applied to many different types of compiled code. Various tools are available to simplify the process of binary diffing, thus quickly allowing an examiner to identify code changes between versions of a disassembled file.
New versions of applications are commonly released in an ongoing manner. The reasoning behind the release can include the introduction of new features, code changes to support new platforms or kernel versions, leveraging new compile-time security controls such as canaries or Control Flow Guard (CFG), and the fixing of vulnerabilities. Often, the new version can include a combination of the aforementioned reasoning. The more changes to the application code, the more difficult it can be to identify those related to a patched vulnerability. Much of the success in identifying code changes related to vulnerability fixes is dependent on limited disclosures. Many organizations choose to release minimal information as to the nature of a security patch. The more clues we can obtain from this information, the more likely we are to discover the vulnerability. If a disclosure announcement states that there is a vulnerability in the handling and processing of JPEG files, and we identify a changed function named RenderJpegHeaderType, we can infer it is related to the patch. These types of clues will be shown in real-world scenarios later in the chapter.
A simple example of a C code snippet that includes a vulnerability is shown here:
/*Unpatched code that includes the unsafe gets() function. */
printf("\nPlease state your name: ");
printf("\nYour name is %s.\n\n", name);
And here's the patched code:
/*Patched code that includes the safer fgets() function. */ int
printf("\nPlease state your name: ");
fgets(name, sizeof(name), stdin);
printf("\nYour name is %s.\n\n", name);
The problem with the first snippet is the use of the gets() function, which offers no bounds checking, resulting in a buffer overflow opportunity. In the patched code, the function fgets() is used, which requires a size argument, thus helping to prevent a buffer overflow. The fgets() function is considered deprecated and is likely not the best choice due to its inability to properly handle null bytes, such as in binary data; however, it is a better choice than gets() if used properly. We will take a look at this simple example later on through the use of a binary diffing tool.
Security patches, such as those from Microsoft and Oracle, are some of the most lucrative targets for binary diffing. Microsoft has historically had a well-planned patch management process that follows a monthly schedule, where patches are released on the second Tuesday of each month. The files patched are most often dynamic link libraries (DLLs) and driver files, though plenty of other file types also receive updates, such as .exe files. Many organizations do not patch their systems quickly, leaving open an opportunity for attackers and penetration testers to compromise these systems with publicly disclosed or privately developed exploits through the aid of patch diffing. Starting with Windows 10, Microsoft is much more aggressive with patching requirements, making the deferral of updates challenging. Depending on the complexity of the patched vulnerability, and the difficulty in locating the relevant code, a working exploit can sometimes be developed quickly in the days or weeks following the release of the patch. Exploits developed after reverse-engineering security patches are commonly referred to as 1-day or n-day exploits. This is different from 0-day exploits, where a patch is unavailable at the time it is discovered in the wild.
As we move through this chapter, you will quickly see the benefits of diffing code changes to drivers, libraries, and applications. Though not a new discipline, binary diffing has only continued to gain the attention of security researchers, hackers, and vendors as a viable technique to discover vulnerabilities and profit. The price tag on a 1-day exploit is not typically as high as a 0-day exploit; however, it is not uncommon to see attractive payouts for highly sought-after exploits. As most vulnerabilities are privately disclosed with no publicly available exploit, exploitation framework vendors desire to have more exploits tied to these privately disclosed vulnerabilities than their competitors.
Binary Diffing Tools
Manually analyzing the compiled code of large binaries through the use of a disassembler such as the Interactive Disassembler (IDA) Pro or Ghidra can be a daunting task to even the most skilled researcher. Through the use of freely available and commercially available binary diffing tools, the process of zeroing in on code of interest related to a patched vulnerability can be simplified. Such tools can save hundreds of hours of time spent reversing code that may have no relation to a sought-after vulnerability. Here are some of the most widely known binary diffing tools:
- Zynamics BinDiff (free) Acquired by Google in early 2011, Zynamics BinDiff is available at zynamics.com/bindiff.html. It requires a licensed version of IDA (or Ghidra).
- turbodiff (free) Developed by Nicolas Economou of Core Security, turbodiff is available at https://www.coresecurity.com/core-labs/open-source-tools/turbodiffcs. It can be used with the free version of IDA 4.9 or 5.0. If the links are not working, try here: https://github.com/nihilus/turbodiff.
- DarunGrim/binkit (free) Developed by Jeong Wook Oh (Matt Oh), DarunGrim is available at https://github.com/ohjeongwook/binkit. It requires a recent licensed version of IDA.
- Diaphora (free) Developed by Joxean Koret. Diaphora is available at https://github.com/joxeankoret/diaphora. Only the most recent versions of IDA are officially supported.
Each of these tools works as a plug-in to IDA (or Ghidra if noted), using various techniques and heuristics to determine the code changes between two versions of the same file. You may experience different results when using each tool against the same input files. Each of the tools requires the ability to access IDA Database (.idb) files, hence the requirement for a licensed version of IDA, or the free version with turbodiff. For the examples in this chapter, we will use the commercial BinDiff tool as well as turbodiff because it works with the free version of IDA 5.0 that can still be found online at various sites, such as at https://www.scummvm.org/news/20180331/. This allows those without a commercial version of IDA to be able to complete the exercises. The only tools from the list that are actively maintained are Diaphora and BinDiff. The authors of each of these should be highly praised for providing such great tools that save us countless hours trying to find code changes.