Today I am so happy to announce a big improvement in the threats observatory (available for here). The main improvement sees the introduction of clustering stereotypes for each tracked malware family in three different behaviors: Domains, Files and Processes.
Every malware does specific actions on domains, files and processes realms by meaning that every sample contacts several domain names, spawns specific processes and eventually saves file on HD (file-less malware are a separate topic here). Collecting everything coming from their execution and clustering on strings similitude would highlight several stereotypes that would be interesting for further studies or similitude blocking lists. The following image shows the current deployment state.
What you find
According to shared information, the Cyber Threats Observatory Dashboard is composed by the following sections:
- Malware Families Trends. Detection distribution over time. In other words what are time-frames in where specific families are most active respect to others.
- Malware Families. Automatic Yara rules classify samples into families. Many samples were not classified in terms of families, this happens when no signatures match the samples or if multiple family signatures match the same sample. In both ways I am not sure where the sample belong with, so it would be classified as “unknown” and not visualized on this graph. Missing slice of the cake is attributed to “unknown”.
- Distribution Types. Based on the magic file bytes this graph would track the percentages of file types that Malware used as carrier.
- Threat Level Distribution. From 0 to 3 is getting more and more dangerous. It would be interesting to understand the threat level of unknown families as well, in order to understand if hidden in unknown families Malware or false positives would hide. For such a reason a dedicated graph named Unknown Families Threat Level Distribution has created.
- Stereotypes. Studying stereotypes would be useful to analyze similarities in clusters. In other words, it could be nice to see what are the patterns used by malware in both: domain names, file names and process names. It would be important for detection and even for preemptive blocking. Due to a vast amount of data, only the last (in term of recent) 10000 entries are included.
- TOP domains, TOP processes and TOP File Names. With a sliding window of 300 last analyzed samples, the backend extracts the TOP (in terms of frequency) contacted domains, spawned processes and utilized file names. Again, there is no filter and no post-processing analysis in that fields, by meaning you could probably find as TOP domain “google.com” or “microsoft update”, which is fine, since if the sample queried them before performing its malicious intent, well, it is simply recorded and took to your attention. Same cup of tea with processes and file names.Indeed those fields are include the term “involved” into their title, if something is involved it does not mean that it is malicious , but that it is accounted to be in a malicious chain.
A simple example
Let’s assume we want to investigate LokiBot. According with any.run: Lokibot, also known as Loki-bot or Loki bot, is an information stealer malware that collects data from most widely used web browsers, FTP, email clients and over a hundred software tools installed on the infected machine.
But let’s start digging a little bit on the Cyber Threats Dashboard and see what we can find. First of all from the Malware Families section we see the overall detection rate. Today, we might easily say that LokiBit has low rate detection percentage 0.32388 if compared to different families such as GrandCrab, Emotet or TrickBot.
From the Family Distribution Over Time section (the following image) we might appreciate the detection distribution rate. By deselecting the unwanted malware families it is possible to track the distribution of the desire one (on our case LokiBot) over the time. In the following case all families but not LokiBot have been disable (by clicking on the Malware name directly from the graph legend). We might appreciate a compelling increment of LokiBot detection on 2020-04-28 and from 2020-04-30 to 2020-05-02. It looks like to be the most active observed period for this well documented family during the 2020. This observation perfectly fits the public mainstream information which sees many security magazines and many vendors observing such an increment as well. Mostly spread over COVID#19 malspam for example: SecurityAffairs, BankInfoSecurity, ThreatPOST, FortiNet.
Digging a little bit into the specific case, we might observe the domain stereotypes. It’s nice to see that many domains stereotypes (in other words the representatives of a wide set of similar domains) have as the Top Level Domain .cf
(Central Africa Republic) and some of them are quire similar: broken1.cf, broken2.cf, and so on and so forth. Something not very original to be blocked such as: broken<number>.cf
Following on the diagram we might observe one more domain stereotype having as TLD .ICU
, in the particular frenchman.icu
(generic TLD targeting entrepreneurs and business owners) and following on this path one more domain stereotype having .co.ke
(referring to Kenya). Now let’s try to focus a little bit on “Files” and check if there are some patterns in “File section”. So let’s check the following diagram.
The linearity of the composition (every stereotype gets the same score, in that case 3) looks like the malware equally uses the different group of files, by meaning that if it starts on a victim machine it reads/creates/writes every single file at least one time per run. We might appreciate a nice pattern in the temporary file names, but it wont help us in detection since default windows temporary file pattern. However we might associate the presence of such a temporary files to the direct usage of spoolsv.exe
, mrsys.exe
and even explorer.exe
. Even if many false positive could be triggered it would be nice to give it a try and see where it takes !
Most interested would be the presence of a specific file ([a..z][0.9]).lck
that would be a nice keypoint to check its presence (by using files detection)
Conclusion
In this post I’ve introduced a big improvement of the Cyber Threat Observatory showing up a quick and dirty analysis on LokiBot through stereotypes. Aim of this project is not to give detailed analyses on Malware but rather focusing on general patterns and macro stereotypes in order to perform massive data analysis.
Hope you might find it useful, if so please share it with your fellows.