️ Inside the 160-Comment Fight to Fix SnakeYAML’s RCE Default

️ Inside the 160-Comment Fight to Fix SnakeYAML’s RCE Default
一位安全研究员通过分析 CVE 漏洞 CVE-2022–1471 发现 Java 库 SnakeYAML 存在长达五年的反序列化漏洞。该漏洞允许攻击者通过 YAML 输入执行远程代码。尽管该问题早在 2016 年就被披露，但 SnakeYAML 的维护者最初拒绝修复，认为这是“特性”而非“漏洞”。经过多次沟通和提供实际攻击payload后，研究员最终说服维护者在 SnakeYAML 2.0 中更改默认行为，使其更安全。 2025-6-5 14:40:15 Author: infosecwriteups.com(查看原文) 阅读量:13 收藏

How a 5-year-old deserialization flaw, a vacation phone call, and some persistence led to a safer Java ecosystem

This whole story starts, as any good story starts, reading CVE’s on GitHub’s advisory feed. A few years back I was reading CVE-2022–1471, a vulnerability reported by Google’s security team. It caught my eye immediately — not because the attack was novel, but because it absolutely wasn’t. The vulnerability allowed remote code execution (RCE) in Java applications using SnakeYAML via unsafe deserialization of YAML input. Sound familiar?

It should. I’d seen this exact vector before — five years earlier — in Moritz Bechler’s now-classic paper, Java Unmarshaller Security: Turning Your Data Into Code Execution. Bechler’s paper had already documented how SnakeYAML, like many Java unmarshalling libraries, allowed attackers to instantiate arbitrary classes just by feeding them crafted YAML documents. SnakeYAML wasn’t just mentioned — it was one of the very first examples in the paper.

So I wasn’t surprised by the CVE itself — I was actually quite grateful the Google security team had gone back and assigned a CVE to this ancient vulnerability. What surprised me was the reaction from the project maintainer, Andrey Somov, who initially marked the issue as “Won’t Fix.” To him, it wasn’t a bug; it was a “feature” that had been documented for years. If users didn’t want arbitrary classes instantiated from YAML, they should have known to use SafeConstructor—but most didn’t. And they had no reason to.

Here’s the insecure, default usage found in nearly every tutorial:

// Insecure by default (prior to SnakeYAML 2.0)
Yaml yaml = new Yaml();
Object obj = yaml.load(input);

And here’s what they would have had to do to stay safe:

// Safer alternative using SafeConstructor
Yaml yaml = new Yaml(new SafeConstructor());
Object obj = yaml.load(input);

Unless a developer already understood the security implications, they’d never know to opt into safety — because the dangerous behavior came baked in, silently, by default.

But here’s the problem: secure defaults matter. And worse, there were already dozens of CVEs in the wild — real, exploitable security bugs — caused by this exact design flaw. I had also seen them show up in GitHub’s CodeQL scans of the open-source ecosystem, which I’d been helping analyze and triage.

In fact, at the time I was personally working to improving CodeQL’s UnsafeDeserialization query. I was going through the documented deserialization classes in that query and adding information on whether each one was secure by default or not. That work sparked some spirited debate in GitHub’s PR #11700, where I argued that CodeQl's tooling should not just flag unsafe APIs—but help developers understand what constitutes secure usage, and why that distinction matters.

So when I joined the Bitbucket issue thread for SnakeYAML, I brought that same mindset. This wasn’t just about tooling or compliance — it was about helping maintainers see how real-world developers were misusing their API, and why the burden of secure configuration shouldn’t fall solely on the end user.

One of the things I’ve come to appreciate — grudgingly — as a security researcher is just how skewed the incentives are when it comes to fixing vulnerabilities at the root.

Vendors building static analysis tools — CodeQL, Fortify, Checkmarx, SonarQube — play an incredibly important role in securing the software ecosystem. But from an economic standpoint, their incentives are often misaligned with what actually makes software safer in the long term. Adding a new rule or query that lights up thousands of dashboards? That’s measurable progress. That’s sales collateral. But working with upstream maintainers to fix a bad default that’s causing the vulnerability in the first place? That’s hard. That’s time-consuming. And that doesn’t directly translate into more tool adoption.

It’s a kind of perverse incentive system. The focus ends up being on detecting badness, not eliminating it at the source.

And to be clear: I’ve been on both sides of that divide. I’ve helped write CodeQL queries that flag unsafe vulnerability patterns. I’ve watched those alerts show up across thousands of open-source repositories. I’ve also received bounties for several of the queries I contributed as part of the now-defunct GitHub Security Lab CodeQL Bug Bounty Program.

But I’ve also seen the futility of closing the same vulnerability over and over again in downstream projects, when the root cause — a dangerous default, a misused API, a permissive parser — remains unaddressed upstream.

So when I saw SnakeYAML’s maintainer dismiss the vulnerability as a non-issue — effectively saying, “This isn’t a problem with the library, just with how people use it” — I couldn’t let that go unchallenged. Because I knew from real-world data that people were using it this way. And they were getting burned.

At that point, I knew what I had to do. If a theoretical argument wasn’t going to move the needle, I’d make it practical.

I went back to Moritz Bechler’s marshalsec paper and payload generator and pulled out actual deserialization payloads he’d documented five years earlier — payloads that still worked against SnakeYAML’s defaults. These weren’t obscure edge cases. These were minimal, weaponizable YAML snippets that could instantiate dangerous classes already sitting on many applications’ classpaths.

I after running the payload generator, I dropped all nine of them directly into the Bitbucket thread.

Aside: It turns out Medium’s WAF doesn’t like it if you inline this payload directly into posts, so I had to use a GitHub Gist 😂

That first example alone can lead to remote code execution if the right classes are in the classpath — something that’s surprisingly easy even in plain Java environments, without needing Spring, JDBC drivers, or JNDI.

I also pointed to other documented gadgets: JdbcRowSetImpl, TemplatesImpl, even Spring's own JNDI-backed bean factories. Every one of them had shown up in either published CVEs or in my own CodeQL scans of the ecosystem.

This wasn’t hypothetical. It was happening, in production, across hundreds of projects.

And yet, the official guidance from the maintainer boiled down to: “Don’t deserialize untrusted YAML,” and “You should have known to use SafeConstructor.”

That argument misses the forest for the trees. Yes, a knowledgeable engineer could choose SafeConstructor. But most developers aren’t parsing YAML because they want deserialization—they just want a config file. And they’re using libraries the way the README tells them to:

new Yaml().load(input);

That line of code — the one in nearly every tutorial and StackOverflow post — was unsafe by default. And real people were paying the price.

Somewhere around the 80th comment in the thread, it became clear we weren’t going to resolve this through Bitbucket alone. So I did something I don’t usually do: I offered to hop on a call.

At the time the offer for a call was accepted, I was on Christmas vacation in Punta Cana, in the Dominican Republic. I took the call from the balcony of my hotel room, listening to waves crash in the background while I talked deserialization edge cases and YAML parsing internals for nearly an hour.

I’ll admit — I wasn’t optimistic. Up to that point, most of our exchanges had been tense. The maintainer, Andrey, didn’t see this as a vulnerability. I and much of the community of users did. We were stuck.

But that hour-long conversation ended up being a turning point — not just for the issue, but for me. We dug deep into the YAML spec, particularly how tags work. For anyone who hasn’t spent their vacation reading YAML specs: a tag in YAML is a way to annotate a node with type information. SnakeYAML had been parsing these tags — especially what the spec calls “global tags” — and treating them as Java class names to instantiate.

That was the problem. That’s how !!javax.script.ScriptEngineManager became a security issue.

After a long back and forth, where I was honestly about ready to give up, I asked a question — I can’t remember exactly what it was — but it reframed the conversation. Suddenly, we weren’t arguing over whether this was “intended behavior.” We were talking about what SnakeYAML should actually do by default.

And then something clicked. I remember Andrey saying something to the effect of, “Yeah, I could probably do that.” And with that, the wheels were in motion for this old vulnerability to finally get fixed.

That’s what led to this post in that way-too-long BitBucket thread:

To that end, I proposed that the change we make is to modify SnakeYaml so it doesn’t process class names in, what in the YAML spec are called ‘global tags’. By not reading class names and attempting to instantiate classes from these global tags — at least not by default — I believe we will remove the vulnerability from SnakeYaml, allowing it to safely operate on untrusted data without leading to this remote code execution vulnerability.

That proposal became the foundation for what would eventually ship in SnakeYAML 2.0.

A few months later, SnakeYAML 2.0 was released. The changelog was modest. But the change itself was anything but.

By default, SnakeYAML would no longer instantiate Java classes from global YAML tags. The exact same payloads that once led to remote code execution would now safely fail unless a developer explicitly configured the library to allow them. The risk was no longer hidden in plain sight — it was opt-in.

It wasn’t a huge pull request. But it was a huge shift in philosophy: Secure by default. Unsafe only with informed consent.

Looking back, the thing I keep thinking about is how close this change came to never happening. CVE-2022–1471 was initially dismissed. It took a mammoth public thread, real-world exploit payloads, and a vacation phone call to finally align on a fix.

And even now — years after the fix shipped — I still get surprised by the impact. I’ve had random strangers walk up to me at conferences and thank me for my work on this issue. I’m always a little shocked when it happens. I wasn’t trying to be a hero. I was just doing what I thought was important: helping make open source safer. But it turns out, when you help get a security fix like this over the finish line, it echoes. Maintainers notice. They remember. And sometimes they carry those lessons into their own projects.

If you’re a security researcher: don’t stop at writing the alert. Help build the fix. If you’re a maintainer: if people are misusing your library in dangerous ways, maybe the problem isn’t the users. And if you’re building tools: detection is important — but the real win is when you make vulnerabilities impossible to write in the first place.

Secure by default isn’t a luxury. It shouldn’t be the exception. It should be the expectation.

If you found this post insightful, I’m currently looking for my next opportunity in software security, open source, or developer tooling. Always happy to hop on a call to connect.

📧 Email: [email protected]
⛓️ LinkedIn: https://www.linkedin.com/in/jonathan-leitschuh/
📅 Schedule Time: https://calendly.com/jlleitschuh

文章来源: https://infosecwriteups.com/%EF%B8%8F-inside-the-160-comment-fight-to-fix-snakeyamls-rce-default-1a20c5ca4d4c?source=rss----7b722bfd1b8d---4
如有侵权请联系:admin#unsafe.sh