Today we cover Devin from Cognition, the first AI Software Engineer.
We will cover Devin proof-of-concept exploits in multiple posts over the next few days. In this first post, we show how a prompt injection payload hosted on a website leads to a full compromise of Devin’s DevBox.
By planting instructions on a website or GitHub issue that Devin processes, it can be tricked to download malware and launch it. This leads to full system compromise and turns Devin into a remote-controlled ZombAI. Any exposed secrets can then be leveraged to perform lateral movement, or other post-exploitation steps.
Read on to learn all the details behind this research.
First, I had to invest $500 to get access to Devin for 30 days.
For reference, Cognition actually now offers a cheaper plan.
First, I had to set up a few infrastructure components for the C2 server. Luckily this is something I typically have handy, so it was quite quick to set up.
As a command & control system I used Sliver from Bishop Fox. I hosted a Sliver server and generated a Linux malware binary with Sliver that will call back and allow remote control of any host that launches the binary.
Next, this GitHub issue was used in the exploit demonstration:
The key trick here is to have Devin navigate off domain (e.g. away from GitHub to the attacker’s page). This is not a requirement, but it appears to be a lot more reliable that way, and better simulates real-world risk. But I observed it followed more complex instructions directly from GitHub also.
The scenario is that we have a GitHub issue that Devin is tasked to look into.
After Devin starts exploring the GitHub issue, it notices the text of the support tool. Since it is mentioned that the tool could help with debugging the issue, Devin navigates off domain to the attacker-controlled website.
Here we go, now Devin has reached the attacker’s website!
This is one of the key observations I had when testing various agents by the way, they like clicking links! And once you get them off domain to an attacker-controlled site things get quite easy with prompt injection payloads. Attacks often work at first try basically.
Devin, again, follows instructions on a website and clicks the link, which initiates the file download.
Next, it switched to a Terminal to inspect and try to run the binary:
As you can see, Devin received a permission denied error. But of course that doesn’t stop Devin. Again, it opened another Terminal, added the execute permission and executed the malware binary again:
If you open the screenshot in full screen you can see the entire Devin session.
This now worked and we got a callback on the Sliver server!
And we can drop into a remote shell and, of course, snoop around Devin’s machine.
More about exfiltrating secrets and information in the next post. So, stay tuned because there are more security issues lurking in Devin.
Check out this demo video for more details and also a couple of variations around the attack chain and learnings
Note: I will package all three Devin posts into a single video that I’ll release with the third post about Devin (planned for Friday this week), then I’ll update this section here.
If you find the video interesting, please subscribe and share!
One observation I had in early exploits was that Devin would disconnect and cancel running the binary pretty quickly.
In order to work around that and maintain persistence I set up a reaction
event for session-connected
which sends a few commands down to the ZombAI right away.
This demonstrates that even if Devin cancels the command, secrets can be exfiltrated within a few milliseconds and an attacker can lay down additional persistence to maintain access.
Since many readers may not have a red teaming background, I figured to throw in some interesting behind the scenes info as well.
It’s also possible to interact with Devin directly from Slack, which makes it an entirely unsupervised interaction. Here is a very similar attack, where one user tasked Devin to investigate and research a website, but while doing that, Devin is compromised.
This shows the danger of unsupervised AI agents with unrestricted access to a large amount of tools.
This vulnerability was reported to Cognition on April 6th, 2025 and acknowledged a few days later. Follow-up queries around fix timelines and status, or coordinated disclosure remain unanswered after 120+ days.
The creation of powerful agents is pretty straightforward, and the value is unlocked by giving an agent access to data and tools. However, some systems are not designed with security in mind at all. Especially novel threats like indirect prompt injection are either not understood or ignored by a few vendors.
Hence, this information is now released publicly according to responsible disclosure best practices, so that users can protect themselves.
The following recommendations were provided to Cognition as part of disclosing the vulnerabilities in April.
During a prompt injection attack from untrusted data, an attacker can gain access to all secrets and keys on the devin-box, and then perform lateral movement to gain access to other hosts in an organization or gain access to cloud infrastructure, etc.
There seem to be two approaches when building coding agents:
The first is that some companies start with some core pieces like reviewing code, writing functions, and slowly expand capabilities, all based on the principle of human in the loop. A great example of this approach is for instance Anthropic’s Claude Code.
The second approach is to just give an agent access to everything possible and hope for the best.
The correct approach certainly appears to start small and get the core pieces right first.
Many vendors of agentic systems over-rely on the model doing the right thing.
This post showed how modern agentic systems commonly have fundamental design weaknesses that can easily lead to full system compromise, which makes them unsuitable for enterprise adoption.
Specifically, agents are naive and untrusted data, like text on a website, can lead to remote code execution, including compromise of all compute resources, sensitive information and API keys on the host.