Welcome to the first installment of our blog series dedicated to shedding light on the intricacies of bot mitigation. In this series, we will explore the fundamental systems and methodologies crucial for discerning between bots and humans. Whether you’re already utilizing a bot mitigation solution or in the process of evaluating one, understanding these components is paramount.
The bot mitigation industry is at a major inflection point. There is a transition occurring from the early entrants to a new group of emerging technologies. Bots have gained the upper hand in the battle with the established solutions. As a result, the companies that invest in these solutions have started voting with their dollars to replace their legacy systems.
Throughout this series, we will cover the wide range of components required to build and maintain a bot detection classification system. These systems are complex, as they must:
This blog details the core components of a bot mitigation solution. Understanding what is under the hood is important, even though many vendors present opaque systems to their customers.
Broadly speaking, all bot mitigation solutions do three things:
collect data → classify data → take an action
In this sense, the bot mitigation industry is primarily a data game, although much surrounding infrastructure is required to enable this game.
This translates to the three primary components of a solution:
Across a wide variety of approaches, these three things remain largely the same. Whilst it seems that everyone offers a bot solution these days, you can broadly generalise them in the following manner:
Data collection forms the lifeblood of any classification system. It involves gathering information from various sources, such as browsers or mobile apps, to discern between legitimate human users and bots. However, striking a balance between robustness against reverse engineering attacks and maintaining optimal user experience poses significant challenges.
Vendors need to invest significant engineering resources in building systems that satisfy two equally important – yet often competing – priorities:
The client-side data collection process must be both:
The process and purpose of data collection are similar between the different types of anti-bot provider models. Both CAPTCHA and CAPTCHA-less solutions collect data, which is then submitted to an API endpoint for classification.
As will be discussed in more detail in a subsequent blog, the need to provide validation that data was collected in real-time is an element of data collection that divides the industry. This is the fundamental purpose of the CAPTCHA—the act of solving the puzzle serves as a signal that a human is interacting with the device. Kasada introduced the first invisible real-time validation solution in 2022. The need to provide real-time validation of data collection is a critical part of shaping the toolkits used by bot developers.
Most vendors provide SDKs to facilitate the data collection process. The SDK is designed to bring order to the unruly world of the web browser and communicate with the APIs that deliver detection logic and receive telemetry data. The integration components of a solution are critical to its ongoing success. The world is complex, with a wide variety of devices, browsers, and operating environments. Vendors need to accommodate this by designing lightweight, performant solutions.
The engineering challenges of building a bot mitigation solution involve balancing several conflicting requirements. The greatest of these challenges is the objective of building a system that delivers the optimum human experience while also being resistant to adversarial reverse engineering. The ability to force a bot developer into a browser and spot attempts to serve fake data is what truly differentiates the market in 2024.
Finding the right balance in this system is critical. In some cases, the answer depends on the sophistication of your adversaries. There is no free lunch in bot detection.
Not all vendors are the same when it comes to data collection. Whilst everyone claims to have the industry’s best detection, this is a great opportunity to really differentiate in the market.
Classification systems employ various techniques, including AI/ML and static detection logic, to differentiate between human and bot traffic. While static detection logic offers simplicity and performance, adversaries continually evolve their tactics, necessitating robust mechanisms to combat evasion techniques such as adversarial inputs and data poisoning.
Whilst artificial intelligence (AI) and machine learning (ML) dominate the market in vendor land, the reality is most classification systems use a large amount of static detection logic.
Static detection logic:
There is a lot to like about static detection logic – until you consider the crafty minds of your adversaries. Bot detection is a non-stationary problem. The world is full of threat actors who view your latest static detection logic only as a temporary barrier.
The key strategies used by bot developers to evade detection include:
There is an art to crafting telemetry payloads to ensure the bot evades detection. Therefore, it is essential to develop hardened data collection components that make it hard for attackers to perform telemetry manipulation. Key components of this module include:
The classification of bots is ultimately a game of anomaly detection that lends itself to something beyond static detection.
Using a classifier to block attacks – whilst maintaining security and usability – is challenging beacause you need a mechanism to handle mistakes and deal with uncertainty.
The primary challenges include:
Most AI classifiers attribute a score based on the information provided and other signals, which represents the likelihood that a request was sent from a bot.
Balancing your classifier’s error rates deeply impacts the security and usability of the overall system. This is most often achieved by adjusting the classifier’s sensitivity and specificity. This process results in a preference for caution (favouring reducing false positives) or optimism (favouring reducing false negatives).
It is important to note that the relationship between a false positive and a false negative is not linear. The more that you reduce one at the expense of the other, the higher your overall error rate will be.
In bot mitigation, the margin for error is nonexistent and both error types are costly. As is often the case, some errors are more costly than others and this requires customer impact assessment.
At Kasada, we conduct the following mechanisms when assessing our machine learning (ML) classifier:
Being able to explain why something was classified is an equally important part of the process.
Some of the key mechanisms used to explain classification results include:
The primary goal is to build an intentionally balanced model that handles errors in a safe and explainable way.
We will cover false positives / false negatives in greater detail in a subsequent blog.
The bot mitigation industry is truly divided on how to deal with mitigation.
Ultimately, the primary objective of both groups is to validate the integrity of the data that feeds the classifier.
A CAPTCHA puzzle requires human interaction to be completed whilst the data is collected. The alternative model uses other mechanisms, beyond the scope of this blog, to achieve the same real-time validation.
Real-time validation of data collection is necessary to prevent telemetry manipulation attacks.
Mitigation ultimately requires that a bot is blocked from achieving its objective. The challenge with mitigation is that bot devs are experts at knowing when they have been detected. Most bots are built with kill switches that take the operation offline as soon as they receive any indicator of detection. This is as true for block, tarpit, or delayed response.
As a general rule, most sophisticated bots can be mapped to some form of monetisation. Ticket bots, eCommerce bots, credential stuffing bots and scraping bots can all be mapped to some form of downstream monetisation. As a result, a successfully mitigated bot will rapidly be replaced with a new and improved version. It’s often said to be a game of cat and mouse, but in reality, it’s more like a game of three-dimensional chess.
It’s not uncommon for sophisticated bot operations to manage multiple variants of their bot simultaneously to prevent disruption. In the context of a hype drop in the eCommerce space, it’s also very common to see bots update their code multiple times in the hours leading up to the event. So, rather than focus on a bot mitigation provider’s mitigation options, you should focus on their ability to pivot and deal with rapidly evolving adversaries.
As you navigate the landscape of bot detection and mitigation solutions, consider the following questions:
Stay tuned for the next blog in our series, where we delve into the importance of classification accuracy.
If you have questions in the meantime, feel free to reach out to me personally on LinkedIn, get a personalised snapshot for your organization, or request a demo with our team of bot experts.
The post Building Strong Defences: The Intricacies of Effective Bot Mitigation – Part 1 appeared first on Kasada.
*** This is a Security Bloggers Network syndicated blog from Kasada authored by Nick Rieniets. Read the original post at: https://www.kasada.io/effective-bot-mitigation-part-1/