Monthly threat intelligence from production AI agent deployments. 91,284 interactions, 47 deployments, 35,711 threats, data through Feb 23.
ATTACK TECHNIQUES (by frequency)
| Rank | Technique | % | Confidence | Risk |
|------|--------------------------|-------|------------|----------|
| 1 | Tool Chain Escalation | 11.7% | 89.3% | CRITICAL |
| 2 | Instruction Override | 9.0% | 96.1% | HIGH |
| 3 | RAG Poisoning | 8.7% | 94.0% | HIGH |
| 4 | System Prompt Extraction | 7.7% | 96.9% | HIGH |
| 5 | Indirect Injection | 7.4% | 95.2% | HIGH |
| 6 | Agent Goal Injection | 6.7% | 97.4% | CRITICAL |
| 7 | Role/Persona Manipulation| 6.1% | 91.3% | MEDIUM |
| 8 | Encoding/Obfuscation | 5.9% | 94.2% | HIGH |
| 9 | Poisoned Tool Output | 5.2% | 97.8% | CRITICAL |
Tool chain escalation is new at #1. Pattern: read to enumerate tools, chain into write/execute. Lower confidence (89.3%) reflects difficulty distinguishing legitimate read-then-write from malicious.
NOVEL VECTORS
Planning-phase goal injection (6.7%, 97.4% confidence). Targets the reasoning layer of autonomous agents. Manipulates the agent's objective graph during multi-step planning. Detection requires monitoring goal state across iterations, not just scanning I/O.
Poisoned tool outputs (5.2%, 97.8% confidence). Agent A returns a crafted output that triggers unintended behavior in Agent B. Supply chain attack within multi-agent systems. High confidence reflects distinctive structural signatures.
Multimodal injection (2.3%, 91.7% confidence). Instructions in image EXIF, PDF annotations, steganographic text, OCR-triggering content. Text-only detection is blind. Approach: extract text from all modalities, run through same classification.
Encoding stacks. Under encoding/obfuscation (5.9%): multi-layer encoding (base64 inside ROT13 inside URL encoding), Unicode confusables, homoglyph substitution. Requires recursive decode-and-scan.
DETECTION ARCHITECTURE
L1: 218 pattern rules, sub-ms
L2: Gemma 5-head multilabel classifier (family, technique, harm, confidence, risk)
Confidence: 94.2%, high-threat precision: 96.8%
FP rate: 13.9% (down from 16.7%)
P95 latency: 189ms
Aligned with OWASP LLM Top 10 2025 and MITRE ATLAS
Full report: https://raxe.ai/labs/threat-intelligence/latest
Open source: github.com/raxe-ai/raxe-ce
Happy to discuss specific attack patterns or detection approaches.