AI Injection Attacks
文章探讨了AI代理被滥用的风险,特别是间接提示注入攻击的威胁。作者指出AI可能误将网络内容视为用户指令,导致执行恶意操作。文中回顾了注入攻击的历史,并提出限制访问不良网站、限制代理能力等防范措施。同时提到AI产品开发中的“军备竞赛”现象可能加剧安全隐患。 2025-9-5 19:26:50 Author: textslashplain.com(查看原文) 阅读量:17 收藏

A hot infosec topic these days is “How can we prevent abuse of AI agents?

While AI introduces awesome new capabilities, it also entails an enormous set of risks from the obvious and mundane to the esoteric and elaborate.

As a browser security person, I’m most often asked about indirect prompt injection attacks, whereby a client’s AI (e.g. in-browser or on device) is tasked with interacting with content from the Internet. The threat here is that the AI Agent might mistakenly treat the web content it interacts with as instructions from the Agent’s user, and so hypnotized, fall under the control of the author of that web content. Malicious web content could then direct the Agent (now a confused deputy) to undertake unsafe actions like sharing private data about the user, performing transactions using that user’s wallet, etc.

Nothing New Under the Sun

Injection attacks can be found all over the cybersecurity landscape.

The most obvious example is found in memory safety vulnerabilities, whereby an attacker overflows a content data buffer and that data is incorrectly treated as code. That vulnerability roots back to a fundamental design choice in common computing architectures: the “Von Neumann Architecture,” whereby code and data are comingled in the memory of the system. While convenient for many reasons, it gave rise to an entire class of attacks that would’ve been prevented by the “Harvard Architecture” whereby the data and instructions would be plainly distinct. One of the major developments of 20 years ago– Data Execution Prevention / No eXecute (DEP/NX) was a processor feature that would more clearly delineate data and code in an attempt to prevent this mistake. And the list of “alphabet soup” mitigations has only grown over the years.

Well beyond low-level processor architecture, this class of attack is seen all over, including the Web Platform, which adopted a Von Neumann-style design in which the static text of web pages is comingled with inline scripting code, giving rise to the ever-present threat of Cross-Site Scripting. And here again, we ended up with protection features like the XSS Filter (IE), XSS Auditor (Chrome) and opt-in features to put the genie back in the bottle (e.g. Content Security Policy) by preventing content and script from mingling in dangerous ways.

I’ll confess that I don’t understand nearly enough about how LLM AIs operate to understand whether the “Harvard Architecture” is even possible for an LLM, but from the questions I’m getting, it clearly is not the common architecture.

What Can Be Done?

In a world where AI is subject to injection attacks, what can we do about it?

One approach would be to ensure that the Agent cannot load “unsafe” web content. Since I work on SmartScreen, a reputation service for blocking access to known-unsafe sites, I’m often asked whether we could just block Agent from accessing bad sites just as we would for a regular human browser user. And yes, we should and do, but this is wildly insufficient: SmartScreen blocks sites found to be phishing, distributing malware, or conducting tech scams, but the set of bad sites grows by the second, and it’s very unlikely that a site conducting a prompt injection attack would even be recognized today.

If blocking bad sites doesn’t work, maybe we could allow only “known good” sites? This too is problematic. There’s no concept of a “trustworthy sites list” per-se. The closest SmartScreen has is a “Top traffic” list, but that just reflects a list of high-traffic sites that are considered to be unlikely sources of the specific types of malicious threats SmartScreen addresses (e.g. phishing, malware, tech scams). And it’s worse than that — many “known good” sites contain untrusted content like user-generated comments/posts, ads, snippets of text from other websites, etc. A “known good” site that allows untrusted 3rd-party content would represent a potential source of a prompt injection attack.

Finally, another risk-limiting design might be to limit the Agent’s capabilities, either requiring constant approval from a supervising human, or by employing heavy sandboxing whereby the Agent operates from an isolated VM that does not have access to any user-information or ambient authority. So neutered, a hypnotised Agent could not cause much damage.

Unfortunately, any Agent that’s running in a sandbox doesn’t have access to resources (e.g. the user’s data or credentials) that are critical for achieving compelling scenarios (“Book a table for two at a nice restaurant, order flowers, and email an calendar reminder to my wife“), such that a sandboxed Agent may be much less compelling to an everyday human.

Aside: Game Theory

Despite the mean security risks introduced by Agentic AI, product teams are racing ahead to integrate more and more capable Agent functionality into their products.

AI companies are racing toward ever-more-empowered Agents because everyone is scared that one of the other AI companies is gonna come out with some less cautious product and that more powerful, less restricted product is gonna win the market. So we end up in the situation with the US at the end of the 1950s, whereby the Russians had 4 working ICBMs but the United States had convinced ourselves they had thousand. So the US built a thousand ICBMs, so the Russians then built a thousand ICBMs, and so on, until we both basically bankrupted the world over the next few decades.


文章来源: https://textslashplain.com/2025/09/05/ai-injection-attacks/
如有侵权请联系:admin#unsafe.sh