Malware, like many complex software systems, relies on the concept of software configuration. Configurations establish guidelines for malware behavior and they are a common feature among the various malware families we examine. The configuration data embedded within malware can offer invaluable insights into the intentions of cybercriminals. However, due to its significance, malware authors deliberately make configuration data challenging to parse statically from the file.
Over the past few years, we have developed a system to extract internal malware configurations. We will share code from our extractors for multiple malware families with the research community. These extractors, written in Python, are designed to scan and extract configuration data from memory dumps associated with specific malware samples.
We will also introduce selected configuration protection techniques employed by two malware families: GuLoader and RedLine Stealer. For those interested in more details, please look into the whitepaper, slides or video we presented at Virus Bulletin 2023 in London.
Palo Alto Networks customers are better protected from these threats through our Next-Generation Firewall with cloud-delivered security services including WildFire. If you think you might have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team.
Related Unit 42 Topics | Malware, Memory Detection, RedLine Stealer, GuLoader |
Technical Analysis of GuLoader
Ciphertext Splitting
Control Flow Obfuscation
Technical Analysis of RedLine Stealer
Conclusion
Indicators of Compromise
SHA256 Hash of the GuLoader Sample Analyzed in This Article
SHA256 Hash of the RedLine Stealer Sample Analyzed in This Article
The GuLoader authors went to great lengths to obfuscate their C2 configuration. Figure 1 provides a timeline illustrating the evolution of GuLoader obfuscation techniques.
This evolution has defeated our previous approach to extracting GuLoader malware configuration. The GuLoader authors’ newer techniques include ciphertext splitting and control flow obfuscation.
We have labeled GuLoader’s previous method of storing encrypted configuration data (ciphertext) in the top section of Figure 2 as the “old method.” In this old method, the ciphertext was stored as a continuous sequence of bytes.
In the lower section of Figure 2 above, we have labeled the new approach as GuLoader’s “new method,” where the ciphertext is computed from a function. In this function, the ciphertext is first divided into a 4-byte DWORD. Each DWORD is individually encrypted using randomized mathematical operations.
For example, to retrieve the first DWORD of the ciphertext from GuLoader’s new method, we must perform the mathematical operations illustrated below in Figure 3.
To acquire the complete ciphertext from this new method, we perform a series of operations similar to the method shown in Figure 3 above for each individual DWORD. Subsequently, we concatenate these DWORD values together, resulting in the complete ciphertext.
In early 2023, we encountered a GuLoader sample that originally had zero VirusTotal (VT) detections. Using Hex-Rays IDA Pro to disassemble and analyze this malware sample, we found instructions that attempted to prevent further analysis. These anti-analysis instructions were designed to cause EXCEPTION_BREAKPOINT, EXCEPTION_ACCESS_VIOLATION and EXCEPTION_SINGLE_STEP violations.
Figure 4 illustrates how GuLoader implemented all these instructions for anti-analysis.
The anti-analysis instructions noted in Figure 4 above rendered our previous solution of writing an IDA processor module extension ineffective. Due to the variable nature of the length of Intel x86 CPU instructions, we could not detect the huge combination of instructions that triggered EXCEPTION_ACCESS_VIOLATION and EXCEPTION_SINGLE_STEP exceptions.
Since our previous solution was no longer effective, we had to manually analyze the code to find these anti-analysis instructions and bypass them to extract the configuration. We explained in detail how we extracted the configuration in our whitepaper for Virus Bulletin.
The SHA256 hash for the RedLine Stealer sample used in this analysis is a4cf69f849e9ea0ab4eba1cdc1ef2a973591bc7bb55901fdbceb412fb1147ef9. Using an MSIL decompiler called dnSpy, we quickly identified the configuration data as shown below in Figure 5.
We implemented a decryption routine in Python as shown below in Figure 6. We invite readers to manually grab example ciphertexts and keys to test whether the script from Figure 6 decrypts correctly.
Next, we located the configuration (shown in Figures 7 and 8) and prepared the decrypt function in Python. However, before decrypting the data, we had to manually grab the ciphertext and key from the decompiled result generated by dnSpy.
When writing C code, we directly access system memory, so we sometimes call the executables compiled from C code as native executables. However, in .NET MSIL, everything is managed. A pointer leads to the character array stored somewhere in the binary in native C code, but all we see in the compiled MSIL are the tokens.
When accessing these tokens, the runtime library (CLR) parses where the ciphertext is actually stored, which is one less thing for an analyst to worry about. For example, in Figure 7 comments generated by dnSpy show that the string IP is a token number 0x04000013.
Next, we open the RedLine Stealer sample in IDA Pro and navigate to the same function. Figure 8 shows that the ldstr commands push object reference for the metadata strings located at seg000:29F1, seg000:29FB, seg000:2A05 and seg000:2A0F. The object references are enclosed in black boxes in Figure 8.
These metadata strings are set by instructions located at seg000:29F6, seg000:2A00, seg000:2A0A and seg000:2A14 respectively. The stsfld instructions replace the value of a static field with a value from the evaluation stack. The values on the evaluation stack for each field are enclosed in red boxes.
The IP field from Figure 7 is not enough to statically extract the configuration. The source of the string that was pushed onto the stack for the IP field has not yet been identified. The operand type of the instruction ldstr shown in Figure 8 is, according to Microsoft, a string token, and string tokens are stored in the #US (User-Stream) table.
To find the string token, we used an open-source library called dnfile, which is like a .NET version of PEfile. Dnfile allows us to easily access the #US tokens by just giving the .NET runtime identifier (RID). Dnfile also provides the interface to access the user streams and a lot more.
The Python implementation shown in Figure 9 is an example of how we accessed user streams by offset. We passed the user string into the decryption routine shown in Figure 9 once we got the user stream by the token. This should return the decrypted configuration.
By delving into the methods used for GuLoader and RedLine Stealer, we shed light on the process of locating and extracting C2 configurations from various malware families.
Leveraging our insights gained from analyzing these malware configurations, we can enhance our ability to detect, analyze and develop effective countermeasures against malicious software. Through continuous collaboration and knowledge sharing, we can collectively stay ahead of cybercriminals to help safeguard our digital systems and networks.
Palo Alto Networks customers are better protected from the threats discussed in this article through the following products:
If you think you might have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:
Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.
Sign up to receive the latest news, cyber threat intelligence and research from us