I have found at least one common theme while working with different WAF solutions over the past 17 years. The first expectation of any WAF is not to block attacks. It is to not break the application. If a WAF impedes application functionality and negatively impacts the revenue the application is generating then the WAF’s primary functions don’t mean anything.
I’ve never heard of an administrator getting dressed down for allowing a few requests containing attack patterns through the WAF. But if an application generating revenue is down for just a few minutes, or even seconds, then Slack messages and PagerDuty alerts start flying, and someone will have to explain what happened.
The Hippocrates quote “first, do no harm” (or “primum non nocere,” the poor Latin translation from the original Greek ὠφελέειν ή μὴ βλάπτειν, which means to help, or at least, to do no harm) is the unofficial and implicit WAF oath. Yes, WAFs must block attacks and abuse. That’s the reason they exist. But if they break the application, what is the point? Nothing pulls the emergency brake of the WAF rollout and trust train faster than false-positive blocking.
But let’s agree on one basic fact. WAFs will always generate false positives unless their sensitivity is set so low as to be completely ineffective. This is why WAF tuning is, and always has been, a major point of friction when using WAFs.
As a Solutions Engineer who has worked with WAF customers for 17 years, I know first hand how painful WAF tuning can be, how critical it is for customers, and how it’s evolved over time. Let’s dive into the history of WAF tuning and where I see it going.
When I first got started in the WAF space, WAFs could be set with different rule sensitivity levels (e.g. ModSecurity paranoia levels 1-4). Regardless of the sensitivity level, if a request triggered one of the rules the request would be blocked (if you were in blocking mode of course). The tradeoff was that more aggressive detections would block more attacks, but were also more likely to block legitimate requests. If you wanted more aggressive detections you would have to go through an ongoing exercise of creating exceptions for blocked false positives such as ignoring detection patterns or customizing the RegEx pattern for the detection. This was an excruciating exercise, and it was more feasible to dial down the sensitivity level so the WAF didn’t need constant turning, but was less effective in blocking attack traffic.
Over time, more accurate attack pattern detection methods other than RegEx (e.g. libinjection) came to market, and some WAFs also introduced functionality that waited until a threshold of attacks was detected from a source before the attacks were blocked. The theory behind this functionality was that if one source already had a low history of attacking false positives, and it blocked only after an attack threshold was crossed, it was much less likely to block false positives. Yes, this means that a single legitimate SQLi attack could get through, but remember that “The first expectation of any WAF is not to block attacks. It is to not break the application.”
This combination of more accurate detections plus threshold blocking of attacks changed the tradeoff equation (and how WAFs were viewed). The sensitivity of the WAF detections could be more aggressive while keeping the number of false positives that result in blocking much lower. The threshold compromise was the ability to thwart egregious attackers with sensitive-attack detections knowing that some legitimate attacks would be detected and not blocked to minimize the chance of breaking an application.
However, a new factor was introduced that needed to be tuned. A decision had to be made on how many potential legitimate attacks could be let through before blocking kicked in.
This threshold had to be relatively high in the beginning. To lower it, the WAF would have to be monitored for false positives. You couldn’t just lower the threshold without first knowing if doing so was going to break the normal use of the application. Again, the monitoring for and defining of false positives was a time-consuming manual task requiring constant attention as the application changed over time. It was regularly accepted after initial WAF tuning, with some attack exceptions defined, that ongoing false positives would be dealt with when something was reported as broken and traced back to the WAF. Constant monitoring for false positives when there wasn’t a reported incident wasn’t feasible.
The obvious next step to maximize attack-detection sensitivity and minimize blocking legitimate traffic without constant manual monitoring and tuning is to automate the detection of false positives. This is not something WAFs are naturally focused on. They are focused on detecting attacks and abuse, not essentially detecting when their own detections are wrong. But that is exactly what is needed.
Attack detections should be self-policing by taking the manual false positive analysis and automating as much as possible. The solution must also be continuous and not require manually enabling it when changes in the application occur.
A common method for surfacing false positives is to look at attack detection logs, web server logs, and/or application logs for patterns that fit the points below:
There is a significant likelihood that an attack is a false positive if it meets the points above. WAFs should be able to perform this ongoing analysis and allow adjustments to the analysis steps per application-specific requirements. Third-generation WAFs should have the ability to self-police their own detections based on empirical evidence.
Of course, a distributed botnet may have found a very specific part of the application that is vulnerable and is exploiting it. That is where the last step of analyzing the attack payloads comes in. Identifying false positives is still a manual process that usually takes a human eye (perhaps soon to be an AI eye).
The identification often follows the Potter Stewart “I know it when I see it” method. Below is an obvious SQLi false positive by one of the most widely used WAFs:
1 message about Antler hardcase travel luggage, about 30″/30 inches (2 wheels are spoilt)
While this is obvious, it would typically get missed in the tsunami of valid attack detections for many public sites. This is especially true if the WAF is thought to already be tuned, but a change in the application introduced new false-positive detections. No one is actively looking for them anymore. That is until calls start coming in from customers or executives asking why the application is broken. But if a WAF can police its own detections and recognize and surface the probable false positives, you’ll have many more needles than hay in your stack to analyze.
Most web applications are now a front-end for an underlying API. Without having the ability to understand the documented and observed API specifications, which are very often not equal, the described self-policing is very difficult.
Consider a false-positive XSS detection for the API below for sending messages:
POST /message/{accountUuid}
{"subject":"Review Please","body":"<h1>Review Required</h1><p>Please review these records and get back to me.</p>"}
Each request path would have a specific {accountUuid} value in the path. To automatically learn this is a likely false positive, the WAF has to understand that the {accountUuid} part of the path is a parameter otherwise it sees each message path as being unique.
Example WAF observed request paths:
/message/b1c6ee76-80a9-4e59-8d89-2b4b09d89285
/message/f41faa4c-d20a-4997-b8a9-5a04f07d38b6
/message/d772d624-6b02-4af3-b0c3-7e7d69f9a3ab
The first point in detecting false positives would never be met at the scale needed for self-policing:
The attack type is found in the same place in many requests (e.g., same hostname, path, and same query parameter name).
A WAF that understands the API specification would know the literal paths in each request, but also know there is a parameterized path in common for all three examples:
/message/{accountUuid}
/message/{accountUuid}
/message/{accountUuid}
The first point in detecting false positives would be met and allow for self-policing to occur.
While it may be possible to create first and second-generation WAF rules that can understand parameterized paths, it is a manual process that requires knowing which paths have parameters. Requiring manual steps like this precludes automatic self-policing. Most WAF administrators don’t have enough application knowledge to know which paths contain parameters. If new parameterized paths are introduced, they will likely not know about them either.
To be effective at self-policing, understanding the documented API specification(s) and learning observed API specification changes in real time is essential.
Understanding parameterized API paths is also required for detecting enumeration attacks and abuse, but that is a topic for another day.
Let’s combine the automation of the steps for looking for false positives with API specification awareness with the previous example.
203.0.113.12 -> POST /message/b1c6ee76-80a9-4e59-8d89-2b4b09d89285
{"subject":"Review Please","body":"<h1>Review Required</h1><p>Please review these records and get back to me.</p>"}
198.51.100.27 -> POST /message/f41faa4c-d20a-4997-b8a9-5a04f07d38b6
{"subject":"Help Needed","body":"<h1>Application broken</h1><p>When I try to change my settings, I am logged out.</p>"}
192.0.2.56 -> POST /message/{accountUuid
{"subject":"Thank you!","body":"<h1>Appreciate the help</h1><p>I appreciate your help with solving my issue.</p>"}
……similar requests from 10 more unique IP addresses in 10 minutes
I should get an automatic suggestion for a false positive exception if everything is working as expected.
Method: POST
Path: /message/{accountUuid}
Ignore if XSS detected in POST parameter: body
This continuous self-policing makes it possible to keep the WAF “tuned” without lengthy regular manual reviews and analysis. The WAF can alert me when something needs review, and I can choose to take or not take the suggestion.
Recall the threshold compromise:
The threshold compromise was the ability to thwart egregious attackers with sensitive-attack detections knowing that some legitimate attacks would be detected and not blocked to minimize the chance of breaking an application.
API context can also change the parameters of the compromise. An API-context aware WAF that is self-policing should be able to quickly reach an equilibrium. Through automation, creating all the needed false-positive attack exceptions for the documented API should be possible in a short period of time. At that point, it’s possible to drop the attack-detection threshold to zero for immediate blocking for the documented and reviewed APIs. Undocumented/shadow APIs would still have threshold-attack blocking enabled to account for ongoing changes to the application where self-policing analysis has not yet been completed.
Documented and reviewed APIs: Immediate attack blocking
Undocumented/shadow and unreviewed APIs: Threshold attack blocking
The fact that WAFs will always have false positives hasn’t changed. But automating the discovery of the false positives and differentiating between APIs that have been self-policed and those that have not allows for both immediate and threshold-attack blocking to exist with a continuous cycle in place to move everything to zero-attack tolerance. Without API-context awareness this wouldn’t be possible.
Over the years I’ve had the great opportunity to work very closely with many WAF customers and many technologies.
As the technology evolved it plugged gaps that frustrated me:
The latest evolution of WAF has been exciting for one primary reason. It allows me to solve attack and abuse use cases I’ve been battling for years, but was limited in detecting and thwarting by the tools available to me.
I’m now in the mode of revisiting all my own notes where I documented my frustrations and WAF-gap analyses. Now that better and more focused tools are available, I’m excited to start building again.
Stay tuned…more to come.
Contact us at try.imp.art and remember to follow us on LinkedIn to stay up-to-date with the latest and greatest.