In the first part I had promised that I would demonstrate that the piracy is good! (sometimes)
I kinda lied back there, but I am not going to lie today: I will tell you all about it in the part 3.
Forensic data hoarding has a lot of benefits. It helps to solve many very common yet often difficult problems (I will cover one of them later in this post), and it also has a nice side-effect to it – it makes us more aware of available forensic artifacts and the fact that there is, or at least should be a very basic need to collect data for everyone in this field.
For example, I keep reminding everyone who wants to listen that there are many localized versions of Windows, and there are lots of architectural quirks around OS folders as well. Yes, it means that your c:\Program Files folder name, same as many others, can be localized and often is. This doesn’t stop people from continuing to write English-centric detections, but at least my conscience is clean…
I mentioned these common yet often difficult problems… Let’s focus on one of them for a moment.
When you analyze malware you often come across code that focuses on terminating processes and/or stopping and/or removing services (service processes). The easy ones do it by the book — they use direct string comparisons, Windows APIs. and the list of targets are often present inside the malware in a form of a string list. The more advanced ones use various hashing algorithms for comparison, and instead of actual strings they store hashes identifying the targets inside the malware samples.
As analysts looking at such hash lists we face an obvious challenge – given a list of hashes, how can we reconstruct a list of strings that these hashes were generated from?
This may sound easy, but it is not. We can brute-force all combinations, but it often can turn to be very costly, plus brute-force attack may end up with a list of random process names for which a calculated hash happens to be identical with the one on the target list, but may not be a correct one (so-called hash collision). A more promising approach here relies on a dictionary attack where we compute hashes for all the possible known process names, and then compare against the targets, but it’s not easy either…
Why?
For the latter to be successful, one needs a large list of legitimate process names in a first place. Googling around and github searches may give you a head start, but it’s often not enough. Yes, many of these process names are often related to security software so an extensive list of process names used by antivirus, edr, firewall, etc. software may help, but it’s often not enough. Nowadays, the target lists are often far wider than that – f.ex. ransomware often kills many other programs as well: multiple variants of Office software, database software, various backup services, email clients, and so on and so forth.
It’s time for a recipe.
If you were about to collect the largest list of process names, how would you do it?
The below list is not extensive, but may help you out:
Chances are, that a set of these will lead you to many interesting process lists.
And now you have your base. It is probably around 1% of all the process names that you want though…
So… we dig deeper.
My personal process list is 1.7M items long. I used it to crack quite a few malware families’ target lists. Yet it still fails me sometimes. Yes, the hoarding never stops.