AI has shifted from experimentation to widespread operational use, and attackers have adapted quickly.
Foreign adversaries now use tools like ChatGPT to strengthen phishing, refine malware, and support influence operations. Their focus isn’t on inventing new attack categories but on amplifying proven techniques by embedding AI into existing intrusion workflows.
Since early 2024, OpenAI has identified and disrupted more than 40 networks abusing its services. As organizations accelerate AI deployments, most security efforts still center on development controls, leaving production environments exposed.
However, once an AI system is deployed, the prompt becomes the only visible control surface for defenders. It’s treated as a natural boundary because it governs how users interact with the model. Yet the systems behind that boundary operate across real networks, real data, and real permissions. The result is a mismatch between where defenders can see risk and where risk actually forms. The prompt becomes a proxy for security even though it was never designed to shoulder that responsibility.
Prompt filtering creates a false sense of protection because it feels eerily similar to user input validation. The idea is that if the prompt can be sanitized or rejected, the AI system will behave as intended.
In practice, attackers can mutate prompts indefinitely while preserving the same underlying intent. This leads to a brittle control surface that catches surface-level patterns but misses deeper behavioral signals. It mirrors the historical failure of perimeter firewalls, which focused on inputs while ignoring what happened after access was granted.
These weaknesses grow sharper as organizations adopt agentic AI. Agents interpret prompts, decompose tasks, and invoke tools based on internal reasoning rather than a single user instruction. Small modifications in phrasing or structure can bypass filters without changing the outcome the attacker seeks. The defender is left validating text while the attacker is already planning the next step. Prompt filtering becomes an increasingly fragile first line of defense against increasingly adaptive threats.
Security risk shifts the moment AI systems begin taking actions rather than generating text. Agents now carry out multi-step tasks that span reconnaissance, access exploration, and data movement.
Anthropic’s late 2025 threat analysis described a state-linked actor using Claude Code to execute most phases of an intrusion with minimal human oversight. The earliest steps looked routine, but the sequence revealed coordinated malicious behavior. Individual actions appeared benign in isolation, and the danger emerged only through the chain of decisions.
This is the central challenge of securing agentic AI. Agents move through systems in ways that map more closely to human workflows than traditional scripts. They access resources, follow links, explore directories, and make decisions across time. When monitoring is limited to the initial prompt, defenders lose visibility into the behaviors that truly matter. The most important indicators of intent never appear at the input layer. They appear downstream as actions evolve and compound. An agent asked to “summarize recent reports” begins recursively exploring document directories, enumerating access scopes, and exporting structured data. No single action violates policy, but the sequence reveals systematic data harvesting.
Independent research on prompt injection continues to show how easily inputs can be manipulated to override intended model behavior.
For security teams, this is a structural blind spot. AI misuse increasingly materializes through behaviors that unfold across networks, not through text artifacts passed at the beginning of an interaction. Defenders need insight into how agents operate after deployment, not just whether the first prompt looked safe.
The core security failure arises when defenders rely on snapshots of activity instead of continuous behavioral observation. Modern deep learning systems are particularly well-suited to modeling these temporal behaviors because they detect deviations across long execution chains rather than single events. This mirrors earlier challenges in behavioral security, where static rules captured known patterns but missed the evolving sequences that revealed malicious intent. Agentic AI intensifies this gap because its behavior is shaped by internal reasoning, tool access, and environmental signals. Without full visibility into action sequences, defenders are left blind to the very signals that matter most.
To build real resilience, organizations must rethink how they observe and evaluate AI systems in production. Once an agent begins carrying out tasks, the question becomes how it moves through the environment, what resources it touches, and whether its actions align with its intended purpose. This requires defenders to follow the full execution pathway rather than isolated events, tracing how individual steps accumulate into meaningful patterns. Security becomes an exercise in understanding the agent’s behavior as it unfolds, not just validating the text that initiated it.
Making this shift means adopting controls that support a continuous view of the agent’s operational life rather than static checkpoints. The first part of that transition is limiting what an agent can do by default through strict least privilege, ensuring it cannot access systems or data it does not genuinely need. From there, identity and session protections become critical because they determine who or what is allowed to trigger actions, and they constrain the situations in which the agent can escalate privileges.
Comprehensive logging plays a similar role by capturing not only prompts but the downstream effects across systems, creating a timeline of behavior that reveals when intent begins to drift. Runtime guardrails such as execution constraints and rate limits help contain harm when the agent steps outside expected parameters, reducing the impact of early-stage anomalies that might otherwise compound silently.
Even with these practices in place, defenders must recognize that static rules will never fully describe how adaptive agent behavior unfolds in a live environment. The goal is not to catch a single suspicious command but to understand the story the system is telling through its choices, timing, and progression. When defenders can observe that story clearly, they gain the ability to intervene before an agent’s behavior turns into an operational threat.
AI safety becomes far more practical once defenders focus on how agents behave rather than how their prompts are filtered. When defenders can observe how an agent behaves in real time, they gain the ability to catch the early signs of intent drifting away from expected patterns. Instead of discovering risk only after damage has occurred, security teams can see misalignment forming and correct it while it is still contained. This makes escalation harder, detection faster, and oversight far more reliable.
Security has evolved this way before. When traditional systems grew too complex for static rules, teams turned to behavioral visibility to understand what those systems were actually doing. AI now demands the same shift. Organizations that adopt this mindset are better equipped to deploy agents at scale while maintaining the trust, accountability, and operational control needed for safe use.
Recent Articles By Author