Following our first article explaining our detection approach and associated challenges, the second one detailing the regular and automated actions implemented through our CI/CD pipelines, we will now conclude this series by presenting the continuous improvement loop that allows us to achieve our long-term objectives, incorporating the principles of detection engineering.
The approach outlined in this series of articles has helped us mitigate the impact of common issues, such as detection rules failing unexpectedly after a rule modification or a parser change. However, some challenges persist, and it remains essential to monitor Key Performance Indicators (KPIs) regularly.
In detection engineering, the most critical KPIs are often the false positive and false negative rates of detection rules. While identifying false negatives is nearly impossible, tracking false positives, although challenging, is feasible and provides valuable insights into rule effectiveness. This challenge is particularly pronounced for companies like ours, operating exclusively as an editor without offering managed services. As a result, the only metrics we receive come from our partners and customers. Precisely, the SOC analyst has the choice between two alerts status, closed or rejected, that could be interpreted respectively as true positive or false positive:
In such cases, several questions arise. For example, how should be categorized a “blocked outgoing HTTP request to a malicious domain”? Is it a false positive, a true positive, or a true positive but benign alert? This scenario highlights the importance of carefully defining alert statuses. However, even with a well-thought-out classification system, human interpretation remains a significant factor.
From our perspective, this situation is always a true positive—it cannot be considered benign, even if the request is blocked, because the request itself indicates an active process within the network attempting to access the malicious domain. However, others might classify it as benign or even as a false positive for many valid reasons. A similar debate can occur in cases where an antivirus quarantines a threat. These differing interpretations underscore the complexity of KPI assessment in detection engineering.
In the end, while our approach can contribute to building a more robust and scalable detection engineering pipeline, continuously monitoring detection rule KPIs and conducting daily reviews of detection rules remain essential practices that should not be left out.
Detection engineers always need more telemetry to improve their detection capabilities. In the lifecycle of a detection rule many changes are required to avoid false positives and extend the detection. Having the data is a first step but you need to process it efficiently. Our approach there was to give them both a proactive and a reactive approach to analyse that data.
For the proactive approach we are using the open source tool Grafana, and have built dashboards where the analyst will find:
The “alert rejection rate by rule” dashboard gives some results that need to be interpreted cautiously considering the SOC analyst bias we have written about in the beginning of this article. Nevertheless it could help the detection engineer to easily pivot on the detailed results for each rule and review quickly, in depth, the related alerts and events. Of course some of these rejection rates need to be pondered, as a 100% rejection rate for a small number of alerts may not be representative.
The “temporal comparison” dashboard provides valuable insights into trends following modifications of a specific rule and, more broadly, facilitates the monitoring of observed threats and their evolution over time. This can help identify cases where further research may be necessary for a given TTP. Additionally, it is important to detect any sudden decrease in the number of alerts generated by a rule, as this may be a weak signal of potential detection failures related to issues such as formatting, data collection, or parsing.
For the reactive approach, we chose to provide a push notification mechanism (through our internal instant messaging app), allowing detection engineers to subscribe to a given rule and receive a daily digest of the rule’ sightings. Since Grafana dashboards already provide a great overview, we do not need to follow every rule as this will just result in spamming us and therefore not looking at these rules. However, subscribing to some rules to receive notifications only for them is important because it allows a detection engineer to easily check a rule’s quality when the rule is created or updated. And as for the previously mentioned Grafana’s dashboards, the results come with the appropriate link to pivot to the alerts, then the events: the analysis could start immediately and help decide and prioritise if any change is needed or not.
Similar to the Grafana dashboards, we also provide push notifications on a daily basis for the top 10 alerts ratio & diff in the alerts numbers as shown in the following screenshot:
This is just like the Grafana dashboard but only for the most obvious detection rules, which allows us to focus on the main issues and not be spammed.
Focusing on these issues often means trying to find out what went wrong with the rule and, often, this means testing the rule again in a detection lab by replaying the attack / TTP / tool.
Going back to the beginning of this series, a detection engineer has to impersonate the attacker in order to catch the real attack. As explained in the first article, we need to cover a very large technology environment, meaning that those must be available as quickly and simply as possible for each analyst. The amount of time and energy to build and maintain a detection lab could be an obstacle to work on a thousand rules!
Several solutions are possible, but on our side we decided to provide to all detection engineers an on demand lab for Windows or Linux environments with Endpoint and Network events collections automatically set up. This provides, in a couple of minutes, a way to either directly interact with the host and run an attack, or upload and execute a provided file. Furthermore for other labs that require several hosts, we tend to automate their creation using common devops best practices (terraform and ansible).
For cloud environments and SaaS products, we are mainly relying on our integration partners to provide the test tenant or when not possible the test events / apis allowing us to reproduce the attack behavior.
Our focus is on minimizing friction in the creation and update process for detection rules, and let our detection engineer focus on the detection techniques. The next step is to prioritize the detection rules changes that need to be made.
Following this methodology for a few months, we could observe some global results (with all the previously mentioned cornerstones), giving us motivation to continue and move forward in the continuous improvement of it.
For instance while creating 25 and updating 83 detection rules, our measure rejection rate decreases from 58% to 38% on a given customer perimeter. Of course this is quite difficult to appreciate in one period of time, but we were able to have the same result in different time slots.
Digging into more detailed examples, the everyday work mentioned above (Monitoring detection rules) proves to be really useful to find among our 1k detection rules, either the real quick wins where some rules have to be fixed, or some peak of attackers activities that need more investigation to enrich our threat intelligence.
For the detection rule “PowerShell Malicious Nishang PowerShell Commandlets”, a daily result pinging an analyst was:
Hunting into the related events, some matching patterns were obvious false positives, allowing to identify the filter to add and some pattern keyword to remove.
Another simple example where a surge of attacker tool usage (Chisel) on one customer is detected through an alert peak on our “SOCKS Tunneling Tool” rule:
After the analysis, the detection engineer confirmed this was a true positive, and also that the similarity strategy (which is how we group events into a same alert), should be fixed to help SOC analysts having relevant events in one alert group by the same hostname and username (in this specific case).
Overall this three parts article series, we share our ideas and applied operational methodology allowing us to handle our detection rules at scale.
In the first article, we described how detection engineers have to face challenges from managing a high volume and diversity of logs to leveraging normalisation for detection rules.
In the second article, we mentioned how implementing detection engineering at scale while adopting best practices from software development serves as a cornerstone for enhancing the efficiency and reliability of detection efforts.
Finally, in this last article, we present how we enhance the process from the second article to be efficient and realistic for detection engineering teams. We discuss how, while a process may appear robust in theory, often has some limitations.These can be mitigated by a proactive and reactive approach at looking at KPIs by the detection engineers.
Without being the perfect solution, this could be from our point of view completely usable by other detection engineering teams, or at least trigger some discussion around it.
If you have any questions and feedback, please feel free to contact us.
Thank you for reading this blog post. Please don’t hesitate to provide your feedback on our publications. You can contact us at tdr[at]sekoia.io for further discussions, always good to have feedbacks from peers.