I didn’t find any interview guide for Cyber Threat Intelligence (CTI) roles that satisfied me (I’m not interested in AI slop), so I decided to write one!
I figured writing this post might help me organize my ideas, and - hopefully - help others preparing for their interviews. The goal is to refresh some topics some of you probably already know, and collect useful links, in case you want to dig deeper into a specific subject.
There are 3 sections:
At the end of each section, you’ll find some interview questions to practice what we’ve revisited.
Be sure to tailor your review based on the job description: every role may have unique requirements, and some are likely not covered in this post.
I’ll steal the definition from the book “Intelligence-Driven Incident Response” by Rebekah Brown and Scott J. Roberts: “Threat intelligence is the analysis of adversaries - their capabilities, motivations, and goals. Cyber threat intelligence is the analysis of how adversaries use the cyber domain to accomplish their goals”.
This shows where CTI fits in the bigger picture; it’s part of a hierarchy that goes from general intelligence down to the cyber domain:
Basically, data becomes intelligence only when enriched with appropriate context.
There are 3 main categories of intelligence:
Tactics, Techniques and Procedures (often abbreviated TTPs) describe the behavior of threat actors. More specifically:
N.B. One technique might fit into several tactics, and vice versa.
I’ll try to make it more practical with an example: an actor wants access to a company’s network, and sends a phishing email with a malicious attachment to employees.
The Traffic Light Protocol (TLP) was designed to support the process of information sharing. It uses five color labels to show how far you can share certain information:
The Intelligence Cycle is a step-by-step approach to turn raw data into actionable insights. Here are the steps:
The Cyber Kill Chain breaks down the steps that threat actors usually take when running an attack. It can help identify potential attack vectors and develop strategies to prevent, detect and respond to threats. Its stages are:
The Diamond model is a framework to analyze network intrusion events. It works by breaking them down into 4 elements, and highlighting the relationships between them. I think of it as a way to simplify events by zooming into the 4 components of the model:
The Diamond model helps connecting the dots between these parts of an attack, for instance:
Here is an ASCII art representing the framework:
Adversary
/\
/ \
/ \
Infrastructure ----- Capability
\ /
\ /
\/
Victim
MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) is a (huge!) database of tactics and techniques (see TTPs) that actors use or have used in the real world.
Explaining the whole MITRE ATT&CK framework it’s out of scope for this blog post, but I wanted to keep a section about it to remember to take some time to play with it if you never used or if it has been a long time.
I’ll also leave here a great infographic about MITRE ATT&CK by Thomas Roccia:
The term “malware” is used for various kinds of software designed to perform undesired actions on a computer system, network or data. Here are some malware’ categories:
When analyzing suspicious files, there are two approaches:
Typically, the best approach is to use a combination of both, to get a full picture of what the file is actually doing.
Sandboxes let an analyst safely run suspicious files or URLs in a controlled environment to observe their behavior. They’re particularly useful to understand what a malware sample does, which files it drops, and which network connections it attempts.
The DIY approach to sandboxes is to build a malware analysis lab using something like FLARE-VM. But in case you don’t want to go through all of that, you can choose from some available online sandboxes like: Any.Run, Hybrid Analysis, or Joe Sandbox. Each of those have different features (some free, some paid), but the idea is the same: run the sample in a safe place, and watch what happens.
From the project home page: “YARA is is a tool aimed at (but not limited to) helping malware researchers to identify and classify malware samples”. With YARA rules, you can describe text or binary patterns and use them to scan files, which is critical for hunting malware families in a large database of samples.
Here is an example of a YARA rule, and an explanation of some of its elements:
// YARA allows to import modules to extend its functionalities,
// some popular modules are: pe, cuckoo, hash
import "pe"
// The rule name identifies the YARA rule, use a meaningful
// name (not like the example below!)
rule DetectSuspiciousPE {
// The metadata section allows to specify information about
// the rule, such as author, date or sample used for the rule
meta:
author = "Andrea Palmieri"
description = "Detects suspicious PE files with magic!"
// The strings fields is used to define what the rule should
// match; you can use text, hexadecimals and regex
strings:
// Simple text string that could appear in malware
// nocase = case insensitive, fullword = non alphanumeric
// wide = 2 bytes per character, base64 = base64 encoding
$text_string = "malicious" nocase wide
// Hex pattern for the suspicious API LoadLibraryA
// Wildcards = { 4C ?F 6? }, Alternatives { 4C (6F|69) }
// Jump = { 72 [2-4] 79 }
$hex_string = { 4C 6F 61 64 4C 69 62 72 61 72 79 41 }
// Regex to catch suspicious command-line flags
$regex = /--inject|--persist/i
// The condition defines when the rule should trigger
condition:
// Check if the file starts with the PE magic number "MZ"
uint16(0) == 0x5A4D
// Ensure it's actually a valid PE file using the PE module
and pe.is_pe
// And at least one of our suspicious indicators is present
and any of them
}
And here is how to run a rule on files in the current folder:
Building YARA rules required balance: if they are too specific, they can miss variations of the same malware; if they are too broad, they will match a lot of false positives. A good rule should capture the malware’s unique “fingerprint” without being too restrictive.
Before discussing web-based threats, it may be useful to briefly revisit 2 important protocols:
TLS (Transport Layer Security) is a cryptographic protocol that keeps network communication secure by encrypting data, ensuring privacy, integrity, and authentication. Its key functions are data encryption (uses symmetric and asymmetric cryptography to make intercepted data unreadable), server authentication (verifies server identity through a digital certificate issued by a trusted authority) and data integrity (ensures transmitted data remains unaltered during transit). Its steps are:
DNS (Domain Name System) is used to translate domain names into IP addresses. It acts like the phonebook of the internet, turning web URLs into IP addresses, here is the step-by-step process:
Shodan continuously scans the internet indexing exposed devices and services. It can be used to track attacker infrastructure (C2, phishing kits, etc.) and monitor the attack surface for specific devices or software.
Censys also scans the internet and collects data about exposed systems, but it is more focused on TLS certificates, protocol details, and structured search.
urlscan.io: similarly to the previously discussed sandboxes, urlscan allows you to analyze websites in a safe environment, and explore what’s under the hood: scripts, redirects, trackers, etc. It’s perfect to track phishing kits, infected pages or quickly check sketchy links.
JA4+is a set of network fingerprints for multiple protocols, and they have a lot of very practical applications, many of which are listed here. You may have heard about JARM fingerprint, JA4+ is basically that with superpowers. Personally, I mostly used it to track TLS servers in Censys: when the server responds to a TLS handshake a fingerprint of the response is generated, and then used to identify clusters of servers operated by the same actor.
crt.sh is a free website for looking up TLS/SSL certificates for any domain. It pulls the information directly from the Certificate Transparency (CT) logs: a public list recording every certificate issued by trusted Certificate Authorities (CAs). CT logs make it easy to spot newly issued certificates, like the ones used in phishing campaigns and domain impersonation.
Passive DNS is like the “memory” of DNS: it works by collecting and storing DNS resolution data over time, creating a database of how domain names have resolved to IP addresses. Unlike normal DNS lookups, passive DNS allows to look at past DNS activity, which is useful for tracking malicious infrastructure, identifying related domains, and patterns in actors’ behavior.
MISP (Malware Information Sharing Platform) is an open-source threat intelligence platform for sharing and ingesting IoCs, attack patterns, threat actor profiles, and other intelligence. MISP members rely on the community: organizations contribute indicators, which are then enriched and shared across trusted groups. It also integrates with a lot of other tools with APIs, so you can pull/push data automatically.
CyberChef is an open-source web-based tool with various functions for data analysis, transformation, and decoding, all using an intuitive drag-and-drop interface. It can handle and concatenate tasks like decoding Base64, extracting data from files, encrypting or decrypting text, and much more. Whenever I have something weird in front of me (an encoded string, a suspicious payload, or just some ugly logs), I throw it into CyberChef, it saves me from writing random scripts and lets me experiment quickly.
Although programming is not mandatory to enter the CTI field, it is definitely a nice-to-have skill and some jobs may require it; moreover, programming can help you automate repetitive tasks, process data, or enrich indicators.
Python is often considered the best choice to learn programming, because of its huge list of libraries for data analysis and security, but other useful languages can be Go (for performance and portability) or JavaScript (if your focus is to understand and analyze web-based threats).
A list of questions that I was asked during interviews:
I’d like to keep this post up to date with relevant CTI topics, so be sure to reach out or comment in case you think I missed something or wrote something awfully wrong 🙃
Books, courses and links to expand on these topics: