From Agent2Agent Prompt Injection to Runtime Self-Defense: How Wallarm Redefines Agentic AI Security

From Agent2Agent Prompt Injection to Runtime Self-Defense: How Wallarm Redefines Agentic AI Security
嗯，用户让我总结一篇文章，控制在100字以内，而且不需要特定的开头。首先，我得仔细阅读文章内容，抓住主要点。文章主要讲的是AI之间的攻击场景，不再是科幻小说里的东西了。研究人员发现AI可以互相注入恶意指令，利用A2A协议的特性。传统安全措施不管用了，所以他们提出了A2AS框架来保护AI的安全。接下来，我需要把这些要点浓缩成100字以内的中文。要确保涵盖AI攻击、A2A协议、传统安全不足以及新框架A2AS的作用。可能会这样组织：首先说明AI攻击的可能性，然后提到A2A协议的问题，接着引出A2AS框架及其功能。最后强调安全的重要性。检查一下字数，确保不超过限制。可能需要调整句子结构，去掉不必要的细节。最终的总结应该简洁明了，涵盖所有关键点。 </think> 研究人员发现AI之间可互相注入恶意指令，利用Agent2Agent（A2A）协议特性绕过传统安全措施。为应对这一威胁，提出A2AS框架，在行为认证、模型自我防御和提示级控制等方面提供新安全机制。该框架旨在确保AI仅执行授权操作，并在运行时实时阻止恶意行为。 2025-12-23 07:55:49 Author: securityboulevard.com(查看原文) 阅读量:0 收藏

Is an AI-to-AI attack scenario a science fiction possibility only for blockbusters like the Terminator series of movies?

Well, maybe not!

Researchers recently discovered that one AI agent can “inject malicious instructions into a conversation, hiding them among otherwise benign client requests and server responses.” While known AI threats involve tricking an agent with malicious data, this new threat exploits a property of the Agent2Agent (A2A) protocol to remember recent interactions and maintain coherent conversations.

AI agents interact with each other, use internal APIs, and operate with privileges. Since traditional AI guardrails and legacy API security no longer cut it, there’s a need for a new approach to security.

Agent2Agent Prompt Injection and Emerging Threats

AI agents can communicate, issue instructions, and bypass human oversight, which makes them both valuable and dangerous.

Pro m pt injection used to involve a user writing a malicious prompt. Now, agents can write malicious prompts that target other agents. This fact has expanded threat vectors and transformed the risk model. Internal API misuse, lateral movement, and chain-of-agent compromise threats are now more acute than ever.

According to security researchers, an AI agent can cause another agent to operate in unintended ways, potentially forcing, for example, data disclosure or unauthorized tool use. It achieves this by delivering multi-stage prompt injections leveraging stateful cross-agent communication behavior of the Agent2Agent (A2A) communication protocol.

The A2A protocol is an open standard that facilitates interoperable communication among AI agents, regardless of vendor, architecture or underlying technology. Its core objective is to enable agents to discover, understand and coordinate with one another to solve complex, distributed tasks while preserving autonomy and privacy. The protocol is similar to Model Context Protocol (MCP). However, MCP focuses on execution through tool integration, which A2A’s goal is agent orchestration.

This risk, dubbed as agent session smuggling attack, is crucial to security leaders for several reasons:

The attack surface now encompasses new threats and attack vectors, including agent APIs, internal tool access, inter-agent messaging, and privileged actions.
Traditional guardrails, such as filtering outputs, may no longer suffice. Attackers can create inputs at the agent-to-agent level, where filtering might not exist.

While previously theorized, this emerging threat highlights the diversity of possible attack scenarios available and waiting to be launched in real environments. Security teams must face up to this new reality. They need to establish a new layer of defense.

Introducing A2AS: A New Standard for Agentic AI Security

The A2AS framework is that new defense layer.

A2AS secures AI agents and LLM-powered applications, just as HTTPS secures HTTP. Researchers built it to address agentic AI security risks. Those risks include prompt injection, tool misuse, and agent compromise. It centers around three breakthrough capabilities:

Behavior Certificates: Declaring and enforcing what agents can and can’t do.
Model Self-Defence Reasoning: Embed security awareness in the model’s context window. This ensures the model rejects malicious or untrusted instructions in real time
Prompt-Level Security Controls: Authenticated prompts, sandboxing, policy-as-code, verify every interaction.

A2AS is important because it represents a shifting approach to security. Agents have privileges and tool access, so monitoring and filtering alone isn’t enough. Security models must now secure the runtime, not just the input. That means integrating runtime self-defense, certification, and enforcement.

Wallarm, together with researchers from AWS, Bytedance, Cisco, Elastic, Google, JPMorganChase, Meta, and Salesforce, played an instrumental role in developing A2AS and is spearheading its adoption.

What A2AS Brings to Agentic AI Security

The A2AS framework aims to ensure AI agents can only do what they’re explicitly allowed to do, and every instruction they see must be authenticated, isolated, and verified.

To make that possible, A2AS uses a five-part standard called the BASIC model.

Behavior certificates define the exact capabilities an agent is permitted to use. That includes tools, files, functions, or system operations. If it’s not certified, it doesn’t happen. These certificates are how A2AS prevents an infected agent from escalating privileges.

Authenticated prompts verify the integrity of every instruction before it enters the context windows. They stop tampered, spoofed, or injected messages from influencing agent reasoning.

Security Boundaries isolate untrusted content from trusted system instructions. They do this by tagging and segmenting everything that enters the model, eliminating the ambiguity that makes prompt injection possible.

In-context defenses embed security reasoning directly inside the model’s context window. They guide the agent to distrust external input, ignore unsafe commands, and actively neutralize malicious patterns during execution.

Codified policies enforce business rules at runtime. That means they block sensitive data, requiring approvals for high-risk actions, and ensure compliance without manual oversight.

Together, these controls create a self-defending agent that can resist user-to-agent attacks, prevent tool misuse, and stop agent-to-agent prompt infections before they spread.

Why A2AS Will Matter for Enterprises and SOCs

As agentic AI adoption grows, the need for standardized security is becoming increasingly urgent. Autonomous agents now manage operations, access sensitive data, and interact with internal systems. That explodes the attack surface.

Moreover, agents communicate, call privileged tools, and operate via APIs never designed for autonomous decision-making. A2AS provides a unifying framework to secure this complexity – similar to how NIST frameworks shaped traditional cybersecurity.

Attackers are shifting focus from traditional breaches to manipulating agent behavior. A compromised agent can trigger unauthorized transactions, leak regulated data, or propagate malicious instructions. SOCs will need A2AS-aligned controls to detect, contain, and attribute these attacks.

Regulations are also evolving. Failing to secure AI agents could soon mean non-compliance.

How, then, can organizations prepare for A2AS?

Here’s a short checklist for locking down your agentic AI systems and preparing for the A2AS framework:

Action	Description
Inventory agentic systems	Identify autonomous agents, their identities, inter-agent communication paths, exposed APIs, and execution privileges. Establish clear ownership and trust boundaries for each agent. Wallarm can help you discover all your ecosystem.
Map agent behavior and exposure	Document what actions each agent is allowed to perform, which tools and data sources it can access, and which prompts or instructions it can receive or generate. This forms the basis for behavior certification.
Enforce runtime protection	Apply real-time controls to inspect and block malicious prompts, unauthorized tool calls, and abnormal agent behavior across APIs and agent interactions. Security must operate at runtime—not only at design time.
Implement behavior certification & policy enforcement	Define and enforce agent behavior certificates, authenticated prompts, and policy-as-code controls to ensure agents act only within approved intent, scope, and authority, in line with A2AS principles.
Monitor, detect, and attribute agent activity	Continuously monitor agent-to-agent interactions, prompt flows, outputs, and tool usage. Enable SOC teams to detect manipulation attempts, attribute actions to specific agents, and contain compromised behavior.
Adopt and align with agentic AI standards	Align internal security controls with emerging frameworks like A2AS to ensure consistency, interoperability, and readiness for future regulatory and industry requirements.

The time to act is now – waiting until there’s a breach is too late.

Secure the Future of Agentic AI

Hardening agent-to-agent communications and agentic orchestration represents a new frontier of cybersecurity strategy. A2AS offers a framework for protecting against many use cases from agent misbehavior and prompt injections to insecure AI supply chains. Wallarm provides the practical implementation. As enterprises embrace agentic AI, security can’t be an afterthought. They must bake it in at runtime, at the API/agent boundary.

To learn more about the A2AS framework, you can visit their website. To find out how Wallarm can help you prepare, schedule a demo today.

The post From Agent2Agent Prompt Injection to Runtime Self-Defense: How Wallarm Redefines Agentic AI Security appeared first on Wallarm.

*** This is a Security Bloggers Network syndicated blog from Wallarm authored by Tim Erlin. Read the original post at: https://lab.wallarm.com/how-wallarm-redefines-agentic-ai-security/

文章来源: https://securityboulevard.com/2025/12/from-agent2agent-prompt-injection-to-runtime-self-defense-how-wallarm-redefines-agentic-ai-security/
如有侵权请联系:admin#unsafe.sh