Press enter or click to view image in full size
This article concludes a three-part series explaining the Microsoft Research paper Securing AI Agents with Information-Flow Control (written by Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin).
In Part I, we looked at why tool-calling agents are dangerous by default. In Part II, we opened the agent and examined the planner: the place where decisions, memory, and labels meet.
In this final part, we answer the most important question: What security guarantees do we actually get once all of this machinery is in place?
This is where the paper moves from mechanisms to guarantees.
Once we have a labeled planner and a taint-tracking planning loop, enforcing security reduces to a single question:
Should this tool call be allowed to proceed?
In the taint-tracking planner (Section 5.2, Part II), this question is answered by a policy check performed before any tool executes.
Policies are expressed purely in terms of labels:
A policy succeeds if, and only if, those labels are no more permissive than what the policy allows. That is, policies are local (checked at each tool call) but give rise to global guarantees about the agent’s behavior.
The paper focuses on two fundamental policies that are sufficient to express most real-world requirements.
The first policy is Trusted Action (P-T).
This policy is designed to protect consequential actions: operations whose mere execution is dangerous, regardless of what data they handle. Examples include sending an email, creating a user, executing a transaction, modifying infrastructure, etc.
P-T requires that a tool call be generated exclusively from trusted data.
Formally, the integrity component of the tool call’s label must be Trusted (T). What this means operationally is simple but powerful: if any untrusted input influenced the decision to call the tool, the call is blocked.
This shuts down an entire class of prompt injection attacks. Even if an attacker manages to inject instructions into a document, email, or webpage, those instructions taint the planner’s context. Once tainted, the planner simply cannot trigger trusted tools.
This is integrity enforced in its strongest form.
The second policy is Permitted Flow (P-F).
P-F is about data egress: sending data to external recipients. Unlike P-T, it does not care whether the decision to act came from a trusted context. Instead, it asks a narrower question: Are all recipients authorized to see this data?
P-F prevents illicit data leaks, even if the action itself was triggered by untrusted input.
Formally, P-F enforces a confidentiality check on the arguments of a tool call. If data labeled as readable only by a certain set of users is about to be sent somewhere, the policy ensures that the recipients are a subset of that set.
This is a weaker guarantee than full non-interference, but it is often exactly what you want in practice.
The real power of the framework comes from combining P-T and P-F, or choosing between them deliberately.
For tools that trigger consequential actions, the paper enforces P-T. For tools that egress data, it enforces P-T, P-F, or both, depending on the desired behavior.
This yields four important regimes:
Press enter or click to view image in full size
Note that different tools demand different guarantees:
The key insight is that security is not a binary concept. The framework lets you explicitly choose which guarantees you want for each tool.
With taint tracking and policies correctly applied, the planner provides two critical assurances:
These guarantees hold regardless of the model’s internal behavior. They do not depend on prompt engineering, alignment, or the model “doing the right thing”. They are enforced structurally, by construction.
The basic planner with dynamic taint tracking has a fundamental limitation. Whenever a tool returns untrusted or confidential data, that data immediately taints the conversation history. As a result, subsequent planner decisions are constrained, and many otherwise legitimate tool calls become disallowed by policy.
The variable-passing planner partially mitigates this issue by storing tool results in variables rather than appending them directly to the conversation. However, this alone is not sufficient for complex agent workflows.
To address these limitations, the paper introduces FIDES: a variable-passing planner equipped with more advanced information-flow control mechanisms.
At a high level, FIDES improves expressiveness without weakening security. It does so through two key ideas:
In earlier planners, every tool result was appended to the conversation history, immediately raising the security label of the current context. In FIDES, this is no longer the default behavior.
Instead, before appending a tool result, the planner applies a function conceptually called HIDE, which examines the result structure node by node.
The logic is simple:
Press enter or click to view image in full size
3.1.1. Example: Selective Variable Introduction in Practice
Join Medium for free to get updates from this writer.
Consider an agent tasked with handling support tickets. The agent retrieves a ticket from an external system and receives the following tool result:
The description field originates from an external user and is therefore labeled untrusted, while the internal notes may be labeled confidential. Appending the entire result directly to the conversation history would raise the context label, restricting which tools the planner can call next.
With FIDES, the planner instead applies selective variable introduction:
3.1.2. Isolating Sensitive Data Without Breaking Planning
With selective variable introduction, the planner can continue issuing Query actions without raising the security label of the conversation history.
Sensitive or untrusted data is stored in variables instead of being appended directly, keeping the current context clean while preserving access to the data when needed.
This provides the same protection as fully hiding tool results, but without sacrificing planning capability. The planner can still reference stored variables in later steps, even though their contents are not exposed in the conversation.
This separation enables fine-grained policies. For example, when calling send_message(recipient, message):
Such distinctions are not possible with a basic taint-tracking planner, and are precisely what make FIDES practical for real agent workflows.
In earlier planners, inspecting a variable meant revealing its full contents to the planner. This immediately tainted the conversation history with the variable’s label, often restricting which tools could be called next.
In FIDES, inspection is no longer an all-or-nothing operation.
Instead of always expanding a variable into the conversation, the planner can perform constrained inspection, extracting only limited, structured information from a variable while preserving information-flow guarantees.
This is achieved by combining variable inspection with the Dual-LLM pattern and constrained decoding.
3.2.1. Example: Inspecting Variables with Bounded Information
Consider an agent assisting with access reviews. The agent retrieves a list of permissions from an external system and stores the result in a variable:
The planner now needs to decide whether escalation is required. It does not need the full permission list, only a simple answer to a specific question: Does this user hold any privileged roles?
Expanding the variable directly would expose untrusted data to the planner and taint the conversation history. Instead, FIDES allows the planner to query the variable using an isolated LLM with a constrained output schema.
For example, the planner issues a query such as:
The isolated LLM processes the variable contents but is restricted to producing a Boolean result. The output is stored in a new variable with a label that reflects both its origin and its bounded information capacity.
3.2.2. Limiting Information Without Losing Control
By constraining inspection outputs, FIDES limits how much information can flow into the planning context.
Low-capacity outputs (such as Booleans or small enumerations) carry provably bounded information. They are far less useful for prompt injection or data exfiltration than unconstrained strings.
As a result:
FIDES resolves a key challenge in secure agent design: “How can agents remain flexible without letting untrusted or sensitive data affect every future decision?”
By selectively hiding data, tracking labels, and limiting what inspection can reveal, FIDES allows planners to stay both capable and safe.
The result is an agent that can:
without depending on prompt engineering or model alignment.
Together, these mechanisms make secure, real-world agent behavior practical.
Across these three parts, we moved step by step:
The core takeaway is simple but profound:
Once you give agents the authority to act, security must live in the architecture — not in the prompt.
Information-flow control gives us a way to build agents that can reason freely while acting safely. Not by trusting the model, but by constraining what its decisions are allowed to affect.
If you’re building autonomous agents that interact with real systems, this line of work is worth your attention. It shows that we don’t have to choose between autonomy and security. We can engineer both!
Follow to stay updated on future deep dives into secure agent architectures.