AI Threat Modelling: A Practical Walkthrough of the TryHackMe Room

Press enter or click to view image in full size

Link — https://tryhackme.com/room/aithreatmodelling

Task 1: Introduction

Artificial Intelligence has rapidly moved from experimental labs into production environments. Today, organizations rely on Large Language Models (LLMs), recommendation engines, fraud detection systems, and Retrieval-Augmented Generation (RAG) pipelines to automate critical business operations. While these systems provide significant business value, they also introduce entirely new attack surfaces that traditional security frameworks were never designed to address.

In this walkthrough, I’ll document my journey through the TryHackMe room “Threat Modelling AI Systems”, where I learned how to assess AI deployments using:

AI-specific asset identification
STRIDE for AI systems
MITRE ATLAS
OWASP LLM Top 10 (2025)
Practical AI threat assessment methodologies

Let’s dive in.

Task 1: Understanding the Scenario

The room places us in the role of a newly hired Threat Analyst at MegaCorp. The organization has heavily adopted AI technologies across several business functions:

Customer Support Chatbot

Powered by an LLM
Connected to internal knowledge bases through a RAG pipeline

Recommendation Engine

Processes sensitive customer information
Generates personalized product recommendations

Fraud Detection Platform

Makes real-time authorization decisions
Continuously retrains on transaction data

The mission from the CISO is simple: Conduct a comprehensive AI threat assessment before the upcoming board meeting. This task introduces the importance of understanding that AI systems are not merely traditional applications with machine learning bolted on. They introduce new assets, new risks, and entirely different failure modes.

Task 2: AI-Specific Assets and Attack Surfaces

Traditional threat models focus on:

Databases
APIs
Credentials
Configuration files
Source code

AI systems introduce additional assets that require protection.

Key AI Assets

Press enter or click to view image in full size

Image from THM Room

1. Training Data

The dataset used to train the model.

Risks:

Data poisoning
Label manipulation
Hidden backdoors

2. Model Weights

The learned intelligence of the model.

Risks:

Model theft
Intellectual property loss
Competitive espionage

3. Embedding Vectors

Used heavily within:

RAG systems
Recommendation engines
Fraud detection systems

These numerical representations help models retrieve relevant information.

4. System Prompts

Instructions that define:

Personality
Restrictions
Guardrails
Business logic

Leaking system prompts can reveal security controls and bypass mechanisms.

5. Feature Stores

Repositories containing processed inputs fed into models. Tampering here changes what the model sees during inference.

6. Model Registries

Storage locations for approved model versions. Compromising the registry allows attackers to deploy malicious or backdoored models.

Key Learning

Unlike a stolen password, compromised model weights cannot simply be rotated. Once an attacker possesses your model, they possess your organization’s AI capability.

Question 1

In a RAG-based system, which AI asset type is used to retrieve relevant context at query time?

Answer: Embedding Vectors

Question 2

Which AI-specific asset is compromised when an attacker swaps a production model inside the model registry?

Answer: Model Registry / Artifacts

Task 3: The AI Data Supply Chain and STRIDE’s Limitations

One of the most valuable lessons from this room is understanding the AI Data Supply Chain.

Press enter or click to view image in full size

Stage 1: Data Collection

Data originates from:

Public web sources
Internal databases
Third-party providers
User-generated content

Attack opportunity:

Poisoned source material

Stage 2: Cleaning and Labeling

Data gets categorized and prepared for training.

Attack opportunity:

Incorrect labeling
Manipulated annotations

Stage 3: Model Training

Patterns become embedded into model weights.

Attack opportunity:

Persistent poisoning
Backdoor implantation

Stage 4: Validation and Packaging

Models are evaluated and stored.

Attack opportunity:

Registry compromise
Model replacement

Stage 5: Inference

The model serves predictions to users.

Attack opportunity:

Prompt injection
Retrieval manipulation
Adversarial inputs

Why STRIDE Alone Isn’t Enough

Press enter or click to view image in full size

Traditional STRIDE was not designed for:

Training data poisoning
Model extraction
Adversarial examples
Prompt injection
Excessive AI agency

AI systems require additional context and frameworks.

Question 1

At which supply chain stage is malicious data injected to influence future model behavior?

Answer: Data Collection

Question 2

Which STRIDE category struggles to properly describe training data poisoning?

Answer: Tampering

Task 4: Adapting STRIDE for AI Systems

The room then reimagines STRIDE through an AI lens.

Spoofing → Data Source Impersonation

Attackers inject malicious content into knowledge sources.

Example: A poisoned RAG document causes a chatbot to deliver false information.

Tampering → Data Poisoning

Attackers modify:

Training datasets
Model weights
Features
Prompts

Associated MITRE ATLAS techniques:

AML.T0020 — Data Poisoning
AML.T0018 — Backdoor ML Model

Repudiation → Lack of Explainability

Organizations cannot always explain:

Why a prediction occurred
Which model version made it
Which context influenced it

This creates audit and compliance challenges.

Information Disclosure → Model Extraction

Attackers repeatedly query APIs to reconstruct proprietary models.

Associated techniques:

AML.T0024 — Extract ML Model
AML.T0025 — Infer Training Data Membership

Denial of Service → Denial of Wallet

A fascinating AI-specific attack.

Rather than crashing systems, attackers generate:

Extremely long prompts
Expensive inference requests
Massive token consumption

Result: Cloud bills skyrocket while systems remain technically online.

Elevation of Privilege → Jailbreaking

Attackers manipulate prompts to bypass restrictions.

Get Gajanan Tayde’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Consequences:

Tool abuse
Database access
Unauthorized actions

OWASP Mapping:

LLM06:2025 — Excessive Agency

Question 1

Primary AI manifestation of Information Disclosure?

Answer: Model Extraction

Question 2

Which STRIDE category covers jailbreaking?

Answer: Elevation of Privilege

Question 3

Which OWASP LLM Top 10 entry addresses excessive permissions?

Answer: LLM06: 2025 — Excessive Agency

Question 4

What is the name of the attack that increases inference costs without causing downtime?

Answer: Denial of Wallet

Task 5: MITRE ATLAS

MITRE ATT&CK revolutionized traditional threat modeling.

For AI, MITRE introduced:

ATLAS

Adversarial Threat Landscape for Artificial-Intelligence Systems

ATLAS provides:

Tactics
Techniques
Sub-techniques
Mitigations
Real-world case studies

Press enter or click to view image in full size

Important Techniques

AML.T0020 — Data Poisoning

Corrupting training data to influence future behavior.

AML.T0024 — Model Extraction

Stealing models through repeated API interaction.

AML.T0015 — Evade ML Model

Crafting inputs designed to bypass detection.

AML.T0051 — LLM Prompt Injection

Manipulating model behavior through prompts.

AML.T0018 — Backdoor ML Model

Embedding hidden triggers into training.

Why ATLAS Matters

STRIDE tells us: What category of threat exists.

ATLAS tells us: Exactly how attackers perform the attack.

Real-World Case Studies

ShadowRay (AML.CS0023)

Attackers exploited vulnerabilities in Ray AI infrastructure.

Morris II Worm (AML.CS0024)

A self-propagating prompt injection worm capable of spreading between AI agents through RAG-enabled communication channels. This demonstrated that AI malware is no longer theoretical.

Press enter or click to view image in full size

Question 1

What does ATLAS stand for?

Answer: Adversarial Threat Landscape for Artificial-Intelligence Systems

Question 2

Which case study documented a self-replicating prompt injection worm?

Answer: Morris II

Question 3

What is the technique ID for Model Extraction?

Answer: AML.T0024

Task 6: OWASP LLM Top 10 (2025)

This section ties everything together.

The OWASP LLM Top 10 maps AI threats directly to architectural components.

Key Risks

LLM01 — Prompt Injection

Targets:

User prompts
RAG content
Retrieved documents

LLM02 — Sensitive Information Disclosure

Targets:

Training datasets
System prompts
Inference outputs

LLM03 — Supply Chain

Targets:

Third-party models
Datasets
Dependencies

LLM04 — Data and Model Poisoning

Targets:

Training pipelines
Feature stores
Registries

LLM05 — Improper Output Handling

Example:

Rendering unsanitized LLM output directly into browsers.

Potential result:

Cross-Site Scripting (XSS)

LLM06 — Excessive Agency

Example:

An AI assistant with unrestricted access to:

Databases
APIs
Email systems

LLM07 — System Prompt Leakage

Exposure of internal instructions and guardrails.

LLM08 — Vector and Embedding Weaknesses

Risks:

Embedding poisoning
Retrieval manipulation

LLM09 — Misinformation

Hallucinations and inaccurate responses.

LLM10 — Unbounded Consumption

Denial-of-wallet attacks and resource exhaustion.

Question 1

How many OWASP entries affect the LLM Inference Endpoint?

Answer: 6

Question 2

Unsanitized LLM output rendered in browsers maps to which OWASP category?

Answer: Improper Output Handling

Question 3

Which component requires the most protection against supply chain threats?

Answer: Training Pipeline

Task 7: Practical Exercise

The room concludes with an interactive threat modeling exercise.

The challenge requires:

Identifying vulnerabilities
Mapping OWASP risks
Associating architectural components
Justifying mitigation choices

This practical exercise reinforces the relationships between:

STRIDE
MITRE ATLAS
OWASP LLM Top 10

Practical Solution:

Press enter or click to view image in full size

Flag

THM{AI_THREAT_MODEL_COMPLETE}

Conclusion

This room provides one of the most structured introductions to AI Threat Modeling currently available on TryHackMe.

The biggest takeaway is that AI security is not simply application security with new terminology.

AI introduces:

New assets
New attack paths
New supply chains
New forms of abuse

A practical assessment workflow emerges:

Step 1: Identify AI Assets

Training data, model weights, embeddings, prompts, and registries.

Step 2: Analyze the Data Supply Chain

Understand where compromise can occur.

Step 3: Apply STRIDE-AI

Categorize threats.

Step 4: Enrich Using MITRE ATLAS

Map threats to documented adversarial techniques.

Step 5: Prioritize Using OWASP LLM Top 10

Identify where risks exist within the architecture. This layered methodology creates a repeatable framework that can be applied to virtually any AI deployment, from chatbots to autonomous agents. As organizations continue integrating AI into business-critical systems, threat modeling skills like these will become just as essential as traditional application security reviews.

Thanks for reading, and happy hacking!

Press enter or click to view image in full size