If you follow this series you should know by now that I am obsessing here not about the benefits of piracy, but about a new, powerful forensic capability: a truly actionable summary (extracted from the ever-growing evidence…).
In my previous posts in this series I covered a number of different approaches one may take to analyze forensic and telemetric data obtained from an individual system or a cluster of endpoints belonging to a specific org, but here’s one more approach: wikipedia’s categorization feature.
You may or may not be aware that there are wikipedia tools available online that allow us to extract a subset of wikipedia database that meets certain criteria: f.ex. one can select all wikipedia pages that are tagged with a certain category and export it to a file. And lo-and behold – for our ‘software categorization purposes there is a really interesting category we should look at: Lists_of_software:
When we click ‘Add’ we will immediately populate the list of pages:
And when we click ‘Export’ we will get a relatively small XML file listing all the pages of interest…
Now, that list of pages is interesting on its own, because we get a really long list of nice categories – 484 (242 unique) entries (as of today) — see: wiki_pages.txt (based on the page list) and wiki_pages_unique.txt (based on the <title> entries from the exported xml file).
Secondly, when you parse this exported XML file, you will end up with a list of software names, vendor names, domains that can be now used to… yes… categorize software we find during the forensic exams! Luckily to us, most of these legitimate software packages listed on wikipedia follow some sort of naming convention schemes – they allow us to recognize them, especially when they are installed to their own, preprogrammed paths.
Thirdly, when you review that exported XML file you will quickly realize how many of these thousands of software packages you never heard of. This is a humble lesson for any Detection Engineering adept out there – we can’t pretend anymore that we are on top of things. Every single software is a potential source of a supply-chain attack. Every single software may be introducing Local Privilege Escalation bugs. Every single software may include new lolbins. Every single software may offer functionality that can and will be abused by attackers. And this is just a software listed on wikipedia. There are gazillions of other software installs out there that have never been looked at, never been scrutinized, never been assessed from a security standpoint.
I will be exploring many of them in my future posts in this series.