Regex: Simplicity, Security, and Power

Regex: Simplicity, Security, and Power
文章探讨了正则表达式（Regex）在处理用户输入中的重要性，并通过示例展示了如何使用Regex过滤恶意输入。作者强调了防御深度的概念，并鼓励渗透测试者和开发者掌握Regex以提高应用安全性。 2025-3-5 15:11:0 Author: redsiege.com(查看原文) 阅读量:2 收藏

by Douglas Berdeaux, Senior Security Consultant

I have a question for web application penetration testers:

How do you provide remediation advice to clients for user input handling flaws in their applications?

Do your clients rely heavily on frameworks, web application firewalls (WAF), and third-party security solutions? If so, then hopefully those security solutions were disabled, or your proxy IP address was allow-listed, before you began your web application penetration tests. This is because our goal is to peek under the hood into the web application to identify flaws that could lead to a devastating impact. Those solutions will inhibit your hacking adventure and may end up wasting your client’s time and money by not getting a thorough test of their applications.

I talk about defense in depth a lot to my clients and the topic has guided a lot of insightful conversations over the years. If our client’s WAF has a newly discovered bypass flaw, and the user input handling mechanisms of the web application where never tested for code injection flaws, then our client’s customer data will be at risk!

When I was a professional developer, my favorite coding practice was using regular expressions (regex for short) to validate anything, including untrusted user input. I would code a simple application component with individual methods that ensured that specific user input matched what the application requested by the user. I’d then share the component across all of the applications I was developing. The component methods obviously did not contain hard-coded values but merely relied on pattern-matching logic.

Let’s Learn Some Regex

Let’s take a look at a simple exercise to being to understand why regex is important to developers, defense-in-depth, and penetration testers. Let’s say we have the following emails input file for Wfuzz and the line numbers are not part of the file:

[email protected]
' or 1=1 -- - [email protected]
[email protected]' or 1=1 #
<h1>It works!</h1>[email protected]
<script src="https://malicious.domain/xss.js"></script>[email protected]
invalid email @ email . domain
@.

We will use this file to fuzz for user input flaws against a single input parameter that is expected to be a valid email address. Now, how can the developer avoid all of the invalid email address lines, including the malicious ones, using pattern matching to only allow line number one? Let’s begin with a simple regex of:

.*@.*\..*

This regex has a lot of wild cards! Without executing this against the emails file, or reading on, what lines would you think that this pattern would filter out? Well, it actually doesn’t filter out any of them! The regex period “.” metacharacter, or operator, literally means “any character” (even white space). The backslash, “\“, preceding the period, after the second asterisk tells the regex interpreter to treat the period literally rather than a metacharacter. The asterisk, “*“, is a regex quantifier that means “any amount of the previous pattern or character, even zero.” Do you see the flaw now? Let’s keep going using something called regex “character classes,” denoted by character ranges, e.g: A-Z to mean all capital letters from A to Z, or 0-9 to mean all numbers from 0 to 9, within square brackets, [ and ].

[a-z]*@[a-z]*\.[a-z]*

How many lines from the emails file do you think this will filter out? Sadly, only one, line 6. But at least we are making progress! So, let’s keep going. Let’s now anchor our regex using the caret, “^“, and United States Dollar symbol, “$“, metacharacters that mean “beginning” and “ending” respectively. This “anchors” our pattern and should ensure that any preceding or appended code would be denied.

^[a-z]*@[a-z]*\.[a-z]*$

Now, guess how many lines are filtered out using this regex? If you guessed “all of them except the valid email address on line 1 and the invalid email address on line number seven,” then you’d be correct. The little pesky input on line number seven sneaking through is due to the asterisk wildcard quantifier metacharacters that we never fixed from the first example. Luckily, we have another quantifier metacharacter that means “at least one of the previous expression or character,” denoted by the plus symbol, “+“, that we can use instead of the asterisks like so:

^[a-z]+@[a-z]+\.[a-z]+$

This new regex filters out all of the lines in the emails file except for the valid email address on line number one. So, we did it. This simple exercise is, by no means, perfect. We don’t account for email addresses with hyphens, subdomains, etc, but it is enough for a jump start into regex for newbies.

Test Yourself

If you would like to repeat this exercise on your own, you can use any string filtering software that supports regular expressions. I like to use Egrep myself. An example of the command using Egrep is:

egrep -Ei '^[a-z]+@[a-z]+\.[a-z]+$' emails.txt

Regular expressions are not only useful to developers. We, as penetration testers, Linux users, and even weekend warriors will, without a doubt, benefit from the knowledge. In fact, we should be knowledgeable in this fascinating and fun subject and offering this advice to our web application developer clients for remediation recommendations of any user input flaw we uncover in our testing. A great resource for learning regex, is RegexOne,

About Douglas Berdeaux, Senior Security Consultant

Douglas was a manager of a Red Team for a consulting company and has performed penetration testing for clients with high security maturity on internal networks, internal and externally facing web applications, and desktop applications. Douglas was a full-stack enterprise-class web developer for close to a decade building, securing, and testing the security of business-critical web and mobile applications. He has taught cybersecurity as an adjunct professor for Duquesne University and has published multiple books and articles about penetration testing, hardware hacking, and programming.

Certifications:

OSWP, OSCP

Continuous Penetration Testing: Explained

By Red Siege | February 25, 2025

Continuous penetration testing is a proactive approach that involves ongoing automated and manual security testing to identify vulnerabilities in a much shorter timeline. Unlike annual or quarterly penetration tests, this […]

Learn More

Continuous Penetration Testing: Explained

Red Siege at Wild West Hackin’ Fest Mile High 2025 – What to Expect!

By Red Siege | February 2, 2025

The Red Siege train is heading to Denver, Colorado, for the first-ever Wild West Hackin’ Fest @ Mile High from February 5-7, 2025! If you’re a cybersecurity professional who loves […]

Learn More

Red Siege at Wild West Hackin’ Fest Mile High 2025 – What to Expect!

Security Posture Review and Penetration Testing

By Red Siege | January 31, 2025

Ever wondered if your organization is truly secure or if your teams are just crossing items off a checklist? A Security Posture Review (SPR) is a solid way to answer […]

Learn More

Security Posture Review and Penetration Testing

文章来源: https://redsiege.com/blog/2025/03/regex-simplicity-security-and-power/
如有侵权请联系:admin#unsafe.sh