In this excerpt of a Trend Micro Vulnerability Research Service vulnerability report, Guy Lederfein and Jason McFadyen of the Trend Micro Research Team detail a recently patched remote code execution vulnerability in Microsoft Windows. This bug was originally discovered by the Microsoft Offensive Research & Security Engineering team. Successful exploitation could result in arbitrary code execution in the context of the application using the vulnerable library. The following is a portion of their write-up covering CVE-2024-20697, with a few minimal modifications.
An integer overflow vulnerability exists in the Libarchive library included in Microsoft Windows. The vulnerability is due to insufficient bounds checks on the block length of a RARVM filter used for Intel E8 preprocessing, included in the compressed data of a RAR archive.
A remote attacker could exploit this vulnerability by enticing a target user into extracting a crafted RAR archive. Successful exploitation could result in arbitrary code execution in the context of the application using the vulnerable library.
The Vulnerability
The RAR file format supports data compression, error recovery, and multiple volume spanning. Several versions of the RAR format exist: RAR1.3, RAR1.5, RAR2, RAR3, and the most recent version, RAR5. Different compression and decompression algorithms are used for different versions of RAR.
The following describes the RAR format used by versions 1.5, 2.x, and 3.x. A RAR archive consists of a series of variable-length blocks.
Each block begins with a header. The following table is the common structure of a RAR block header:
The RarBlock Marker is the first block of a RAR archive and serves as the signature of a RAR formatted file:
This block always contains the following byte sequence at the beginning of every RAR file:
0x52 0x61 0x72 0x21 0x1A 0x07 0x00 (ASCII: "Rar!\x1A\x07\x00")
The ArcHeader is the second block in a RAR file and has the following structure:
The ArcHeader block is followed by one or more FileHeader blocks. These blocks have the following structure:
Note that the above offsets are relative to the existence of the optional fields.
The EndBlock block will signify the end of the RAR archive. This block has the following structure:
For each FileHeader block in the RAR archive, if the Method field is not set to "Store" (0x30
), then the Data field will contain the compressed file data. The method of decompression depends on the RAR version used to compress the data. The RAR version needed to extract the compressed data is recorded in the UnpVer field of the FileHeader block.
Of relevance to this report is the RAR extraction method used by RAR format version 2.9 (a.k.a. RAR4), which is used when the UnpVer field is set to 29. The compressed data may be compressed either using the Lempel-Ziv (LZ) algorithm or using Prediction by Partial Matching (PPM) compression. This report will not describe in full detail the extraction algorithm, but only summarize the relevant parts for understanding the vulnerability. For a reference implementation of the extraction algorithm, see the Unpack::Unpack29()
function in the UnRAR source code.
When the libarchive library attempts to extract the contents of a file from a RAR archive, if the file data is compressed (i.e. the Method field is not set to "Store"), the function read_data_compressed()
will be called to extract the compressed data. The compressed data is composed of multiple blocks, each of which can be compressed using the LZ algorithm (denoted by the first bit of the block set to 0) or using PPM compression (denoted by the first bit of the block set to 1). Initially, the function parse_codes()
will be called to decode the tables necessary to extract the file data. If a block of data compressed using the LZ algorithm is encountered, the expand()
function will be called to decompress the data. In the expand()
function, symbols are read from the compressed data by calling read_next_symbol()
in a loop. In the function read_next_symbol()
, the symbol will be decoded according to the Huffman table decoded in function parse_codes()
.
If the decoded symbol is 257, the function read_filter()
will be called to read a RARVM filter, which has the following structure:
Note that the above offsets are relative to the existence of the optional fields.
The calculation of the size of the Code field is as follows: If the lowest 3 bits of the Flags field (will be referred to as LENGTH) are less than 6, the code size is (LENGTH + 1). If LENGTH is set to 6, the code size is (LengthExt1 + 7). If LENGTH is set to 7, the code size is (LengthExt1 << 8) | LengthExt2
. After the code length is calculated and the code itself is copied into a buffer, the code, its length, and the filter flags are sent to the parse_filter()
function to parse the code section.
Within the code section, numbers are parsed by calling the function membr_next_rarvm_number()
. This function reads 2 bits, and according to their value, determines how many bits to read to parse the value. If the first 2 bits are 0, 4 value bits will be read; if they are 1, 8 value bits will be read; if they are 2, 16 value bits will be read; and
if they are 3, 32 value bits will be read.
Function parse_filter()
will parse the code section, which has the following structure:
Note that if the READ_REGISTERS flag is not set, the registers will be initialized, such that the 5th register is set to the block length, which is either read from the code section (if the READ_BLOCK_LENGTH flag is set), or carried over from the block length of the previous filter.
After these fields are parsed in parse_filter()
, the ByteCode field and its length are sent to the function compile_program()
. In this function, the first byte of the bytecode is verified to be equal to the XOR of all other bytes in the bytecode. If true, it will set the fingerprint field of the rar_program_code
struct to the value of the CRC-32 algorithm run on the full bytecode, combined with the bytecode length shifted left 32 bits.
Back in the function parse_filter()
, after all fields are calculated for the filter, therar_filter
struct will be initialized by calling create_filter()
with the rar_program_code
struct containing the fingerprint field and the register values calculated. These values will be set to the prog field and the initialregisters fields of the rar_filter
struct, respectively.
Once processing of the filter is done, function run_filters()
is called to run the parsed filter. This function initializes the vm field of the rar_filters struct with a structure of type rar_virtual_machine. This structure contains a registers field, which is an array of 8 integers, and a memory field of size 0x40004
. Then, each filter is executed by calling execute_filter()
. If the fingerprint field of the rar_program_code struct associated with the executed filter is equal to either 0x35AD576887
or 0x393CD7E57E
, the execute_filter_e8()
function is called. This function reads the block length from the 5th field of the initialregisters array. Then, a loop is run for replacing instances of 0xE8
and/or 0xE9
within the VM memory, with the block length used as the loop exit condition.
An integer overflow vulnerability exists in the Libarchive library included in Microsoft Windows. The vulnerability is due to insufficient bounds checks on the block length of a RARVM filter used for Intel E8 preprocessing, included in the compressed data of a RAR archive. Specifically, if the archive contains a RARVM filter whose fingerprint field is calculated as either 0x35AD576887
or 0x393CD7E57E
, it will be executed by calling execute_filter_e8()
. If the 5th register of the filter is set to a block length of 4, the loop condition in this function, which is set to the block length minus 5, will overflow to 0xFFFFFFFF
. Since the VM memory has a size of 0x40004
, this will result in memory accesses that are out of the bounds of the heap-based buffer representing the VM memory.
A remote attacker could exploit this vulnerability by enticing a target user into extracting a crafted RAR archive, containing a RARVM filter that has its 5th register set to 4. Successful exploitation could result in arbitrary code execution in the context of the application using the vulnerable library.
Notes:
• All multi-byte integers are in little-endian byte order.
• All offsets and sizes are in bytes unless otherwise specified.
• Since there is no official documentation of the RAR4 format, the description is based on the UnRAR and libarchive source code. Field names are either copied from source code or given based on functionality.
Detection Guidance
To detect an attack exploiting this vulnerability, the detection device must monitor and parse traffic on the common ports where a RAR archive might be sent, such as FTP, HTTP, SMTP, IMAP, SMB, and POP3.
The detection device must look for the transfer of RAR files and be able to parse the RAR file format. Currently, there is no official documentation of the RAR file format. This detection guidance is based on the source code for extracting RAR archives provided by the UnRAR program and the libarchive library.
The common structure of a RAR block header is detailed above. The detection device must first look for a RarBlock Marker, which is the first block of a RAR archive and serves as the signature of a RAR formatted file:
The detection device can identify this block by looking for the following byte sequence:
0x52 0x61 0x72 0x21 0x1A 0x07 0x00 ("Rar!\x1A\x07\x00")
If found, the device must then identify the ArcHeader, which is the second block in a RAR file and is detailed above. The ArcHeader block is followed by one or more FileHeader blocks, whose structure is also detailed above. Note that the above offsets are relative to the existence of the optional fields.
The detection device must parse each FileHeader block and inspect its Method field. If the value of the Method field is greater than 0x30
, the detection device must inspect the Data field of the FileHeader block, containing the compressed file data. The compressed data may be compressed either using the Lempel-Ziv (LZ) algorithm or using Prediction by Partial Matching (PPM) compression. This detection guidance will not describe in full detail the extraction algorithm. For a reference implementation of the extraction algorithm, see the Unpack::Unpack29() function in the UnRAR source code.
The compressed data is composed of multiple blocks, each of which can be compressed using the LZ algorithm (denoted by the first bit of the block set to 0) or using PPM compression (denoted by the first bit of the block set to 1). The detection device must extract each block according to the algorithm used to compress it. If a block compressed using the LZ algorithm is encountered, the detection device must decode the Huffman tables from the beginning of the compressed data. The detection device must then iterate over the remaining compressed data and decode each symbol based on the generated Huffman tables. If the symbol 257 is encountered, the following data must be parsed as a RARVM filter, which has the following structure:
Note that the above offsets are relative to the existence of the optional fields.
The detection device must then calculate the size of the Code field. The calculation of the size of the Code field is as follows: If the lowest 3 bits of the Flags field (will be referred to as LENGTH) are less than 6, the code size is (LENGTH + 1). If LENGTH is set to 6, the code size is (LengthExt1 + 7). If LENGTH is set to 7, the code size is (LengthExt1 << 8) | LengthExt2
. After the size of the Code field is calculated, the Code field must be parsed according to the following structure:
All numerical fields within this structure (FilterNum, BlockStart, BlockLength, register values, and ByteCodeLen) must be read according to the algorithm implemented in the RarVM::ReadData()
function of the UnRAR source code. The algorithm reads 2 bits of data, signifying the number of bits of data containing the numerical value. Note that some of the fields in this structure are optional and depend on flags set in the Flags field of the RARVM filter structure.
After extracting all necessary fields, the detection device must check for the following conditions:
• The CRC-32 checksum of the ByteCode field is 0xAD576887
and the ByteCodeLen field is 0x35
OR the CRC-32 checksum of the ByteCode field is 0x3CD7E57E
and the ByteCodeLen field is 0x39
.
• The READ_REGISTERS flag is set and the value of the 5th register of the Registers field is set to 4
OR the READ_BLOCK_LENGTH flag is set and the value of the BlockLength field is set to 4
. If both these conditions are met, the traffic should be considered suspicious. An attack exploiting this vulnerability is likely underway.
Notes:
• All multi-byte integers are in little-endian byte order.
• All offsets and sizes are in bytes unless otherwise specified.
Conclusion
Microsoft patched this vulnerability in January 2024 and assigned it CVE-2024-20697. While they did not recommend any mitigating factors, there are some additional measures you can take to help protect from this bug being exploited. This includes not extracting RAR archive files from untrusted sources and filtering traffic using the guidance provided in the section “Detection Guidance” section of this blog. Still, it is recommended to apply the vendor patch to completely address this issue.
Special thanks to Guy Lederfein and Jason McFadyen of the Trend Micro Research Team for providing such a thorough analysis of this vulnerability. For an overview of Trend Micro Research services please visit http://go.trendmicro.com/tis/.
The threat research team will be back with other great vulnerability analysis reports in the future. Until then, follow the team on Twitter, Mastodon, LinkedIn, or Instagram for the latest in exploit techniques and security patches.