By Henrik Brodin, Lead Security Engineer, Research
The aCropalypse is upon us!
Last week, news about CVE-2023-21036, nicknamed the “aCropalypse,” spread across Twitter and other media, and I quickly realized that the underlying flaw could be detected by our tool, PolyTracker. I’ll explain how PolyTracker can detect files affected by the vulnerability even without specific file format knowledge, which parts of a file can become subject to recovery using acropalypse.app, and how Google and Microsoft could have caught this bug by using our tools. Coincidentally, my colleagues, Evan Sultanik and Marek Surovič, and I wrote a paper that describes this class of bugs, defines a novel approach for detecting them, and introduces our implementation and tooling. It will appear at this year’s workshop on Language-Theoretic Security (LangSec) at the IEEE Security and Privacy Symposium.
We use PolyTracker to instrument the image parser, libpng
. (Any parser will do, not just aCropalyptic ones.) The PolyTracker instrumentation tells us which portions of the input file are completely ignored by the parser, which we call blind spots. Blind spots are almost always indicators of design flaws in the file format, malformation in the input file, and/or a bug in the parser. Normal images should have almost no blind spots, but parsing malformed aCropalyptic images through libpng
reveals the cropped data in a large blind spot. The aCropalypse bugs could have been caught if the vulnerable products had been instrumented with PolyTracker and their output tested for blind spots.
# parse the screenshot with an instrumented version of pngtest $ ./pngtest.instrumented re3eot.png.png out_re3eot.png.png # ask polytracker to identify any blindspots in the file $ polytracker cavities polytracker.tdag Re3eot.png,697120,1044358 # found a blind spot starting at offset 697120 (size ~300KiB), it is ignored and contains the cropped out image data that could be retrieved
Understanding the aCropalypse
According to this tweet, it is possible to recover parts of an original image from a cropped or redacted screenshot. The TL;DR is that when the Google Pixel built-in screenshot editing tool, Markup, is used to crop or resize an image, it overwrites the original image, but only up to the offset where the new image ends. Any data from the original image after that offset is left intact in the file. David Buchanan devised an algorithm to recover the original image data still left in the file; you can read more about the specifics on his blog.
More recently, Chris Blume identified a similar vulnerability for the Windows Snipping Tool. The methodology we describe here for the Markup tool can be used on images produced by the Windows Snipping Tool.
PolyTracker has a feature we introduced a couple of years ago called blind spot detection. We define blind spots as the set of input bytes whose data flow never influences either the control flow that leads to an output or an output itself. Or, in layman’s terms, unused file data that can be altered to have any content without affecting the output. The cropped-out regions of an aCropalypse image are, by definition, blind spots, so PolyTracker should be able to detect them!
One of the challenges of tracking input bytes and detecting blind spots for real-world inputs like PNG images or PDF documents is taint explosion. The PNG file format contains compressed chunks of image data. Compression is especially keen on contributing to taint explosion as input bytes combine in many ways to produce output bytes. PolyTracker’s unique representation of the taint structure allows us to track 2^31 unique taint labels, which is necessary for analyzing taints propagated during zlib-decompression of image data.
aCropalyptic files will have Blind Spots when processed
To understand why the aCropalypse vulnerability produces blind spots, we need to combine our knowledge of the vulnerability with the description of blind spots. When parsing a PNG file with a PNG parser, the parser will interpret the header data and consume chunks according to the PNG specification. In particular, it will end at a chunk with type IEND, even if that is not at the actual end of the file.
We use PolyTracker to instrument a tool (pngtest from the libpng project) that reads PNG files and writes them to disk again. This will produce an additional output file, called polytracker.tdag
, that captures the data flow from the runtime trace. Using that file and PolyTracker’s blind spot detection feature, we can enumerate the input bytes that do not affect the resulting image. Remember, these are the bytes of the input file that neither affect any control flow, nor end up (potentially mixed with other data) in the output file. They have no actual meaning in interpreting the format for the given parser.
Show me!
Using the PolyTracker-instrumented pngtest
application, we load, parse, and then store the below image to disk again. During this processing, we track all input bytes through PNG and zlib processing until they eventually reach the output file in some form.
We use a Docker image containing the PolyTracker instrumented pngtest application.
$ docker run -ti --rm -v $(pwd):/workdir acropalypse $ cd /workdir $ /polytracker/acropalypse/libpng-1.6.39/pngtest.instrumented re3eot.png.png out_re3eot.png.png
The re3eot.png
image is 1044358 bytes in size, whereas the out_re3eot.png
is 697,182 bytes. Although this indicates a fairly large reduction in size, at this point we can’t tell why; it could, for example, be the result of different compression settings.
Next, let’s find the blind spots from this process:
$ polytracker cavities polytracker.tdag 100%|███████████████████| 1048576/1048576 [00:01<00:00, 684922.43it/s] re3eot.png,697120,1044358 out_re3eot.png,37,697182
The output we are interested in is:
re3eot.png,697120,1044358
This tells us that the data starting from offset 697,120 to the end of the file was ignored when producing the output image. We have found a blind spot! The additional 347,238 bytes of unused data could be left from an original image—an indication of the aCropalypse vulnerability. Let’s use the acropalypse.app web page to see if we can recover it.
This indicates that the file was in fact produced by the vulnerable application. At this point, we know that the image contains data from the original image at the end, as that is the core of the vulnerability. We also know the exact location and extent of that data (according to the blind spot’s starting offset and size). To confirm that the data is in fact a blind spot, let’s manually crop the original image and redo the pngtest
operation to ensure that the resulting files are in fact equal. First, let’s copy only the portion that is not a blind spot—the data that is used to produce the output image.
$ dd if=re3eot.png of=manually_cropped_re3eot.png count=1 bs=697120
Next, let’s run the pngtest
application again:
$ /polytracker/acropalypse/libpng-1.6.39/pngtest.instrumented manually_cropped_re3eot.png out_manually_cropped_re3eot.png
If our assumption—that only the first 697,120 bytes were used to produce the output image— is correct, we should have two identical output files, despite the removal of 347,238 bytes from the manually_cropped_re3eot.png
input file.
$ sha1sum out_manually_cropped_re3eot.png out_re3eot.png 8f4a0417da4c68754d2f85e059ee2ad87c02318f out_manually_cropped_re3eot.png 8f4a0417da4c68754d2f85e059ee2ad87c02318f out_re3eot.png
Success! To ensure that the manually cropped file isn’t still affected by the vulnerability, let’s use the web page to try to reconstruct additional image data in the file. This attempt was unsuccessful, as we have removed the original image contents. (Yes, I have checked the cropped screenshot for blind spots 😁).
To better understand why the blind spot started at the particular offset, we need to examine the structure of the original image.
PolyFile to the rescue
PolyTracker has a sibling tool: PolyFile, a pure Python cleanroom implementation of libmagic, with instrumented parsing from Kaitai struct and an interactive hex viewer. We will use PolyFile’s ability to produce an HTML rendering of the file structure to understand why file processing ends before the file ends.
First, we use the following command to produce an HTML file representing the file format:
$ polyfile --html re3eot.html re3eot.png.
When we open the re3eot.html
file in a browser, we’ll see an initial representation of the file.
By repeatedly expanding the file structure on the left-hand side, we eventually reach the final chunk.
As shown in the above picture, the final chunk, when interpreting the PNG-format, has type IEND. Following that chunk is the remaining data from the original file. Note how the superfluous data starts at offset 0xaa320—that is, 697,120, the exact same offset of the identified blind spot. If you were to scroll all the way to the end, you would find an additional IEND structure (from the original image), but that is not interpreted as a valid part of the PNG file.
It doesn’t stop here
Having almost no knowledge of the PNG file format, we were able to use PolyTracker instrumentation on an existing PNG processing application to detect not only files that have blind spots, but also their exact location and extent.
PolyTracker can detect blind spots anywhere in the file, not only at the end. Even though we analyzed PNG files, PolyTracker isn’t limited to a specific format. We have previously analyzed conversion of PDFs to PostScript using MμPDF. The same technique is valid for any application that does a load/store or deserialize/serialize operation. To further increase our understanding of the format and the effects of the vulnerability, we used PolyFile to inspect the file structure.
These are just a couple of use cases for our tools, there are plenty of others! We encourage you to try our PolyTracker and PolyFile tools yourself to see how they can help you identify unexpected processing and prevent vulnerabilities similar to the aCropalypse in your application.
Acknowledgements
This research was supported in part by the Defense Advanced Research Projects Agency (DARPA) SafeDocs program as a subcontractor to Galois under HR0011-19-C-0073. The views, opinions, and findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Many thanks to Evan Sultanik, Marek Surovič, Michael Brown, Trent Brunson, Filipe Casal, Peter Goodman, Kelly Kaoudis, Lisa Overall, Stefan Nagy, Bill Harris, Nichole Schimanski, Mark Tullsen, Walt Woods, Peter Wyatt, Ange Albertini, and Sergey Bratus for their invaluable feedback on the approach and tooling. Thanks to Ange Albertini for suggesting angles morts—French for “blind spots”—to name the concept, and to Will Tan for sharing a file affected by the vulnerability. Special thanks to Carson Harmon, the original creator of PolyTracker, whose ideas and discussions germinated this research, and Evan Sultanik for helping write this blog post.