Press enter or click to view image in full size
Link — https://tryhackme.com/room/aithreatmodelling
Task 1: Introduction
Artificial Intelligence has rapidly moved from experimental labs into production environments. Today, organizations rely on Large Language Models (LLMs), recommendation engines, fraud detection systems, and Retrieval-Augmented Generation (RAG) pipelines to automate critical business operations. While these systems provide significant business value, they also introduce entirely new attack surfaces that traditional security frameworks were never designed to address.
In this walkthrough, I’ll document my journey through the TryHackMe room “Threat Modelling AI Systems”, where I learned how to assess AI deployments using:
- AI-specific asset identification
- STRIDE for AI systems
- MITRE ATLAS
- OWASP LLM Top 10 (2025)
- Practical AI threat assessment methodologies
Let’s dive in.
Task 1: Understanding the Scenario
The room places us in the role of a newly hired Threat Analyst at MegaCorp. The organization has heavily adopted AI technologies across several business functions:
Customer Support Chatbot
- Powered by an LLM
- Connected to internal knowledge bases through a RAG pipeline
Recommendation Engine
- Processes sensitive customer information
- Generates personalized product recommendations
Fraud Detection Platform
- Makes real-time authorization decisions
- Continuously retrains on transaction data
The mission from the CISO is simple: Conduct a comprehensive AI threat assessment before the upcoming board meeting. This task introduces the importance of understanding that AI systems are not merely traditional applications with machine learning bolted on. They introduce new assets, new risks, and entirely different failure modes.
Task 2: AI-Specific Assets and Attack Surfaces
Traditional threat models focus on:
- Databases
- APIs
- Credentials
- Configuration files
- Source code
AI systems introduce additional assets that require protection.
Key AI Assets
Press enter or click to view image in full size
1. Training Data
The dataset used to train the model.
Risks:
- Data poisoning
- Label manipulation
- Hidden backdoors
2. Model Weights
The learned intelligence of the model.
Risks:
- Model theft
- Intellectual property loss
- Competitive espionage
3. Embedding Vectors
Used heavily within:
- RAG systems
- Recommendation engines
- Fraud detection systems
These numerical representations help models retrieve relevant information.
4. System Prompts
Instructions that define:
- Personality
- Restrictions
- Guardrails
- Business logic
Leaking system prompts can reveal security controls and bypass mechanisms.
5. Feature Stores
Repositories containing processed inputs fed into models. Tampering here changes what the model sees during inference.
6. Model Registries
Storage locations for approved model versions. Compromising the registry allows attackers to deploy malicious or backdoored models.
Key Learning
Unlike a stolen password, compromised model weights cannot simply be rotated. Once an attacker possesses your model, they possess your organization’s AI capability.
Question 1
In a RAG-based system, which AI asset type is used to retrieve relevant context at query time?
Answer: Embedding Vectors
Question 2
Which AI-specific asset is compromised when an attacker swaps a production model inside the model registry?
Answer: Model Registry / Artifacts
Task 3: The AI Data Supply Chain and STRIDE’s Limitations
One of the most valuable lessons from this room is understanding the AI Data Supply Chain.
Press enter or click to view image in full size
Stage 1: Data Collection
Data originates from:
- Public web sources
- Internal databases
- Third-party providers
- User-generated content
Attack opportunity:
- Poisoned source material
Stage 2: Cleaning and Labeling
Data gets categorized and prepared for training.
Attack opportunity:
- Incorrect labeling
- Manipulated annotations
Stage 3: Model Training
Patterns become embedded into model weights.
Attack opportunity:
- Persistent poisoning
- Backdoor implantation
Stage 4: Validation and Packaging
Models are evaluated and stored.
Attack opportunity:
- Registry compromise
- Model replacement
Stage 5: Inference
The model serves predictions to users.
Attack opportunity:
- Prompt injection
- Retrieval manipulation
- Adversarial inputs
Why STRIDE Alone Isn’t Enough
Press enter or click to view image in full size
Traditional STRIDE was not designed for:
- Training data poisoning
- Model extraction
- Adversarial examples
- Prompt injection
- Excessive AI agency
AI systems require additional context and frameworks.
Question 1
At which supply chain stage is malicious data injected to influence future model behavior?
Answer: Data Collection
Question 2
Which STRIDE category struggles to properly describe training data poisoning?
Answer: Tampering
Task 4: Adapting STRIDE for AI Systems
The room then reimagines STRIDE through an AI lens.
Spoofing → Data Source Impersonation
Attackers inject malicious content into knowledge sources.
Example: A poisoned RAG document causes a chatbot to deliver false information.
Tampering → Data Poisoning
Attackers modify:
- Training datasets
- Model weights
- Features
- Prompts
Associated MITRE ATLAS techniques:
- AML.T0020 — Data Poisoning
- AML.T0018 — Backdoor ML Model
Repudiation → Lack of Explainability
Organizations cannot always explain:
- Why a prediction occurred
- Which model version made it
- Which context influenced it
This creates audit and compliance challenges.
Information Disclosure → Model Extraction
Attackers repeatedly query APIs to reconstruct proprietary models.
Associated techniques:
- AML.T0024 — Extract ML Model
- AML.T0025 — Infer Training Data Membership
Denial of Service → Denial of Wallet
A fascinating AI-specific attack.
Rather than crashing systems, attackers generate:
- Extremely long prompts
- Expensive inference requests
- Massive token consumption
Result: Cloud bills skyrocket while systems remain technically online.
Elevation of Privilege → Jailbreaking
Attackers manipulate prompts to bypass restrictions.
Get Gajanan Tayde’s stories in your inbox
Join Medium for free to get updates from this writer.
Consequences:
- Tool abuse
- Database access
- Unauthorized actions
OWASP Mapping:
- LLM06:2025 — Excessive Agency
Question 1
Primary AI manifestation of Information Disclosure?
Answer: Model Extraction
Question 2
Which STRIDE category covers jailbreaking?
Answer: Elevation of Privilege
Question 3
Which OWASP LLM Top 10 entry addresses excessive permissions?
Answer: LLM06: 2025 — Excessive Agency
Question 4
What is the name of the attack that increases inference costs without causing downtime?
Answer: Denial of Wallet
Task 5: MITRE ATLAS
MITRE ATT&CK revolutionized traditional threat modeling.
For AI, MITRE introduced:
ATLAS
Adversarial Threat Landscape for Artificial-Intelligence Systems
ATLAS provides:
- Tactics
- Techniques
- Sub-techniques
- Mitigations
- Real-world case studies
Press enter or click to view image in full size
Important Techniques
AML.T0020 — Data Poisoning
Corrupting training data to influence future behavior.
AML.T0024 — Model Extraction
Stealing models through repeated API interaction.
AML.T0015 — Evade ML Model
Crafting inputs designed to bypass detection.
AML.T0051 — LLM Prompt Injection
Manipulating model behavior through prompts.
AML.T0018 — Backdoor ML Model
Embedding hidden triggers into training.
Why ATLAS Matters
STRIDE tells us: What category of threat exists.
ATLAS tells us: Exactly how attackers perform the attack.
Real-World Case Studies
ShadowRay (AML.CS0023)
Attackers exploited vulnerabilities in Ray AI infrastructure.
Morris II Worm (AML.CS0024)
A self-propagating prompt injection worm capable of spreading between AI agents through RAG-enabled communication channels. This demonstrated that AI malware is no longer theoretical.
Press enter or click to view image in full size
Press enter or click to view image in full size
Question 1
What does ATLAS stand for?
Answer: Adversarial Threat Landscape for Artificial-Intelligence Systems
Question 2
Which case study documented a self-replicating prompt injection worm?
Answer: Morris II
Question 3
What is the technique ID for Model Extraction?
Answer: AML.T0024
Task 6: OWASP LLM Top 10 (2025)
This section ties everything together.
The OWASP LLM Top 10 maps AI threats directly to architectural components.
Key Risks
LLM01 — Prompt Injection
Targets:
- User prompts
- RAG content
- Retrieved documents
LLM02 — Sensitive Information Disclosure
Targets:
- Training datasets
- System prompts
- Inference outputs
LLM03 — Supply Chain
Targets:
- Third-party models
- Datasets
- Dependencies
LLM04 — Data and Model Poisoning
Targets:
- Training pipelines
- Feature stores
- Registries
LLM05 — Improper Output Handling
Example:
Rendering unsanitized LLM output directly into browsers.
Potential result:
- Cross-Site Scripting (XSS)
LLM06 — Excessive Agency
Example:
An AI assistant with unrestricted access to:
- Databases
- APIs
- Email systems
LLM07 — System Prompt Leakage
Exposure of internal instructions and guardrails.
LLM08 — Vector and Embedding Weaknesses
Risks:
- Embedding poisoning
- Retrieval manipulation
LLM09 — Misinformation
Hallucinations and inaccurate responses.
LLM10 — Unbounded Consumption
Denial-of-wallet attacks and resource exhaustion.
Question 1
How many OWASP entries affect the LLM Inference Endpoint?
Answer: 6
Question 2
Unsanitized LLM output rendered in browsers maps to which OWASP category?
Answer: Improper Output Handling
Question 3
Which component requires the most protection against supply chain threats?
Answer: Training Pipeline
Task 7: Practical Exercise
The room concludes with an interactive threat modeling exercise.
The challenge requires:
- Identifying vulnerabilities
- Mapping OWASP risks
- Associating architectural components
- Justifying mitigation choices
This practical exercise reinforces the relationships between:
- STRIDE
- MITRE ATLAS
- OWASP LLM Top 10
Practical Solution:
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Flag
THM{AI_THREAT_MODEL_COMPLETE}
Conclusion
This room provides one of the most structured introductions to AI Threat Modeling currently available on TryHackMe.
The biggest takeaway is that AI security is not simply application security with new terminology.
AI introduces:
- New assets
- New attack paths
- New supply chains
- New forms of abuse
A practical assessment workflow emerges:
Step 1: Identify AI Assets
Training data, model weights, embeddings, prompts, and registries.
Step 2: Analyze the Data Supply Chain
Understand where compromise can occur.
Step 3: Apply STRIDE-AI
Categorize threats.
Step 4: Enrich Using MITRE ATLAS
Map threats to documented adversarial techniques.
Step 5: Prioritize Using OWASP LLM Top 10
Identify where risks exist within the architecture. This layered methodology creates a repeatable framework that can be applied to virtually any AI deployment, from chatbots to autonomous agents. As organizations continue integrating AI into business-critical systems, threat modeling skills like these will become just as essential as traditional application security reviews.
Thanks for reading, and happy hacking!
Press enter or click to view image in full size