Most engineers find documentation unappealing because it often feels like a chore that takes away time from more engaging tasks that solve technical problems and can feel tedious when we are focused on delivering functionality or fixing urgent issues. Regardless, sufficiently documenting our detections is essential in detection engineering as it provides context around the purpose, detection logic, and expected behaviour of each detection rule. Documentation also helps other teams, like SOC analysts, understand what a detection is looking for, how it was built, and how to respond when it fires.
Just as important as documenting individual detections is tracking how the overall detection library evolves. In most cases, the detection library is dynamic and evolves over time to include new content packs and detections as well as modifications to existing ones. These changes impact how the environment is monitored, so it’s crucial to keep other teams informed. In that sense we need to figure out a way to generate a type of change log for our detection library.
That said, let’s look at how we can tackle those issues in the repository we talked about in Part 2.
To automatically generate wiki pages documenting our detections, we can convert the YAML metadata files into Markdown format. YAML is already easily readable but when communicating our work with other teams we should use a more user friendly format. Markdown is also the supported format for Azure DevOps wiki pages [1]. To perform the conversion we are going to utilize a Jinja [2], which is a templating engine with special placeholders in the template that allow writing code similar to Python. Data is then passed to the template to render the final document.
The Jinja template that we are going to use to convert the meta files to markdown is the following:
# {{ data.name }}
**ID:** {{ data.id }}
**Level:** {{ data.level }}
**Version:** {{ data.version }}
## Description
{{ data.description }}
## Data Sources
{% for data_source in data.data_sources %}
### {{ data_source.vendor }} {{ data_source.product }} {{ data_source.service }}
- **Category:** {{ data_source.category }}
- **Vendor:** {{ data_source.vendor }}
- **Product:** {{ data_source.product }}
- **Service:** {{ data_source.service }}
- **Event ID:** {{ data_source.event_id }}
{% endfor %}
## References
{% for ref in data.references %}
- {{ ref }}
{% endfor %}
## Blindspots
{% for blindspot in data.blindspots %}
- {{ blindspot }}
{% endfor %}
## Known False Positives
{% for fp in data.known_false_positives %}
- {{ fp }}
{% endfor %}
## Investigation Steps
{% for step in data.investigation_steps %}
- {{ step }}
{% endfor %}
## Tags
{% for tag in data.tags %}
- {{ tag }}
{% endfor %}
Markdown
We will also convert each platform that we support to markdown format. As an example we will provide the Jinja template for Sentinel rules.
# {{ data.properties.displayName }}
**ID:** {{ data.id }}
**Severity:** {{ data.properties.severity }}
**Kind**: {{ data.kind }}
**Enabled**: {{ data.properties.enabled }}
**Query:**
```
{{data.properties.query}}
```
### General Configuration
- **Query Frequency**: {{ data.properties.queryFrequency }}
- **Query Period**: {{ data.properties.queryPeriod }}
- **Trigger Operator**: {{ data.properties.triggerOperator }}
- **Trigger Threshold**: {{ data.properties.triggerThreshold }}
- **Suppression Enabled**: {{ data.properties.suppressionEnabled }}
- **Suppression Duration**: {{ data.properties.suppressionDuration }}
- **Event Grouping Settings Aggregation Kind**: {{ data.properties.eventGroupingSettings.aggregationKind }}
### Incident Configuration
- **Create Incident**: {{ data.properties.incidentConfiguration.createIncident }}
- **Grouping Enabled**: {{ data.properties.incidentConfiguration.groupingConfiguration.enabled }}
- **Reopen Closed Incident**: {{ data.properties.incidentConfiguration.groupingConfiguration.reopenClosedIncident }}
- **Lookback Duration**: {{ data.properties.incidentConfiguration.groupingConfiguration.lookbackDuration }}
- **Matching Method**: {{ data.properties.incidentConfiguration.groupingConfiguration.matchingMethod }}
**Group By Entities**:
{% if data.properties.incidentConfiguration.groupingConfiguration.groupByEntities %}
{% for item in data.properties.incidentConfiguration.groupingConfiguration.groupByEntities %}
- {{ item }}
{% endfor %}
{% else %}
- _None_
{% endif %}
**Group By Alert Details**:
{% if data.properties.incidentConfiguration.groupingConfiguration.groupByAlertDetails %}
{% for item in data.properties.incidentConfiguration.groupingConfiguration.groupByAlertDetails %}
- {{ item }}
{% endfor %}
{% else %}
- _None_
{% endif %}
**Group By Custom Details**:
{% if data.properties.incidentConfiguration.groupingConfiguration.groupByCustomDetails %}
{% for item in data.properties.incidentConfiguration.groupingConfiguration.groupByCustomDetails %}
- {{ item }}
{% endfor %}
{% else %}
- _None_
{% endif %}
Markdown
Processing the detections with Python and the Jinja2 library [3] is simple. For each matching file, we load the data and render it with the appropriate template (meta.jinja or sentinel.jinja), and then save the output in the specified directory. The described logic is implemented in the following python script that we call documentation.py.
import os
import yaml
import json
import argparse
from jinja2 import Environment, FileSystemLoader
def generate_docs(base_dir, output_dir):
env = Environment(loader=FileSystemLoader("pipelines/scripts/templates"))
for root, _, files in os.walk(base_dir):
for file in files:
if file.endswith("_meta.yml"):
meta_path = os.path.join(root, file)
try:
with open(meta_path, "r") as f:
data = yaml.safe_load(f)
template = env.get_template("meta.jinja")
md_content = template.render(data=data)
base_name = file.replace("_meta.yml", ".md")
md_path = os.path.join(output_dir, base_name)
with open(md_path, "w") as f:
f.write(md_content)
print(f"Generated: {md_path}")
except Exception as e:
print(f"##vso[task.logissue type=error]Error processing {meta_path}: {e}")
if file.endswith("_sentinel.json"):
sentinel_rule_path = os.path.join(root, file)
try:
with open(sentinel_rule_path, "r") as f:
data = json.load(f)
template = env.get_template("sentinel.jinja")
md_content = template.render(data=data)
base_name = file.replace("_sentinel.json", "")
subpage_dir_path = os.path.join(output_dir, base_name)
if not os.path.exists(subpage_dir_path):
os.mkdir(subpage_dir_path)
sentinel_path = os.path.join(subpage_dir_path, "sentinel.md")
with open(sentinel_path, "w") as f:
f.write(md_content)
print(f"Generated: {md_path}")
except Exception as e:
print(f"##vso[task.logissue type=error]Error processing {meta_path}: {e}")
def main():
parser = argparse.ArgumentParser(description="Convert detection files to markdown docs.")
parser.add_argument("--file-dir", required=True, help="Root directory to scan for files.")
parser.add_argument("--output-dir", required=True, help="Directory to store output md files")
args = parser.parse_args()
generate_docs(args.file_dir, args.output_dir)
if __name__ == "__main__":
main()
Python
The next step is to automatically generate the MD files upon a pull requests merge to the main branch. To achieve that we need to create a pipeline with a trigger on the main branch for all files under the detection/ directory. We will save the generated md files under a new directory called documentation, which must also be added under the repository structure validation that we described on Part 3. The pipeline will first run our documentation.py script and then commit all new, modified and deleted files to the main branch.
name: MD Files Generation
trigger:
branches:
include:
- "main"
paths:
include:
- "detections/*"
exclude:
- ".git*"
- "*.md"
jobs:
- job: MDFileGeneration
displayName: "MD File Generation"
condition: eq(variables['Build.SourceBranchName'], 'main')
steps:
- checkout: self
persistCredentials: true
- script: |
python pipelines/scripts/documentation.py --file-dir detections/ --output-dir documentation/
displayName: 'Run MD File Generation'
- bash: |
git config --global user.email "[email protected]"
git config --global user.name "Azure DevOps Pipeline"
[[ $(git status --porcelain) ]] || { echo "No changes, exiting now..."; exit 0; }
echo "Changes detected, let's commit!"
git pull origin $(Build.SourceBranchName)
git checkout $(Build.SourceBranchName)
git add documentation/
git commit --allow-empty -m "Automated commit from pipeline - Update MD Files"
git push origin $(Build.SourceBranchName)
displayName: Commit MD Files to Repo
YAML
In order for us however to issue a commit from the pipeline we need to provide the “Contribute” permission to the Azure DevOps’s project Build Service account. We can do that by navigating to the Project Settings -> Repositories -> Selecting our Repository -> Clicking on Security -> Selecting the main branch -> Selecting the DAC Build Service Account and assigning the Contribute permission.
A sample output of the pipeline run is the following:
As a last step, in order to publish the MD files as a wiki in Azure DevOps navigate to the Wiki pages from the left panel. Click on the project’s Wiki dropdown box and select Publish Code as wiki.
On the popup right side panel fill in the required fields as so and click on Publish.
We click on Publish and end up with a Wiki for detections that is searchable and can be shared with other teams.
The documentation is a good first step towards communicating the work done within the detection engineering team. However, in the rapidly changing field of cybersecurity, frequent updates necessitate a more effective method for communicating changes to the detection library. While Git has version control and Git platforms do provide a way to view the history of changes, it is only possible to view them ad-hoc without any summarization and are not very usable by people who are not familiar with those tools.
Luckily since we are leveraging Git we can iterate through the commit history ourselves and put together a Python script that will generate a change log for us with the summarized changes over a period of time.
But first we’ll go through some Git commands [4] to understand their output and how it will help us in our script.
The following Git command retrieves a list of commit hashes from the main branch, between January 1, 2025 and August 12, 2025, focusing on changes made to .json and .yml files in the detections directory as well as .json files in the content_packs directory. The output is formatted to display only the commit hashes.
git log main --since=2025-01-01 --until=2025-08-12 --pretty=format:%H -- detections/*.json content_packs/*.json detections/*.yml
Bash
Once we have the commit list we can iterate through it and identify the added, modified, deleted, or renamed files. The following Git command lists the changes on json and yaml files within the detections and content_packs directories, made in a specific commit, identified by the hash. The output of the command is a list of status and file pairs.
git diff-tree -M --no-commit-id --name-status -r 17ff5c0963132ea22a9fba9f8a33359cd08b097f -- detections/*.json content_packs/*.json detections/*.yml
Bash
For each commit we also collect information on the date, author and commit message.
git show -s --format="Commit-Date:%ci Commit-Author:%cn Commit-Message:%s" 17ff5c0963132ea22a9fba9f8a33359cd08b097f
Bash
Finally, the following Git command displays the changes made to a specific file in a given commit, without using a pager. It also suppresses commit header information to focus solely on the file changes.
git --no-pager show --pretty="" --unified=-1 17ff5c0963132ea22a9fba9f8a33359cd08b097f -- content_packs/cloud_o365.json
Bash
git --no-pager show --pretty="" --unified=-1 17ff5c0963132ea22a9fba9f8a33359cd08b097f -- detections/cloud/azure/azure_ad_azure_discovery_using_offensive_tools/azure_ad_azure_discovery_using_offensive_tools_sentinel.json
Bash
Depending on the Git client version you are using you might need to verify that the output is as displayed in the screenshots.
The diff output can then be parsed using regular expressions [5] to identify actual changes in the code files, rather than just identifying file modifications. By doing so, we end up with an intermediate change log list that includes all the information gathered by running the Git commands above as well as the actual changes in the modified files.
Parsing the changes in the files using regular expressions is easier if multiline and beautified files are used.
We iterate through the change log list and convert it to Markdown format summarizing the changes in time buckets by file type (content pack or detection), and change type (added, deleted, modified, renamed or moved). Detection engineers can use that markdown change log to convey their efforts to other teams. That way, changes to the detection library are much easier digestible by non-detection engineering members since they do not have to manually go through the git history and look trough the changes. A sample output of our script is provided below:
The commit hashes in the output MD file are clickable and can be used to navigate to the actual changes in the repository.
The code of the script is provided below.
import argparse
import os
import re
import subprocess
import pytz
from datetime import datetime, timedelta
from collections import defaultdict
base_paths = ["detections/*.json", "content_packs/*.json", "detections/*.yml"]
def run_command(command: list) -> str:
"""Executes a shell command and returns the output."""
try:
# print(f"[R] Running command: {' '.join(command)}")
output = subprocess.check_output(command, text=True, encoding="utf-8", errors="replace").strip()
# print(f"[O] Command output:\n{'\n'.join(['\t'+line for line in output.splitlines()])}")
return output
except subprocess.CalledProcessError as e:
print(f"##vso[task.logissue type=error]Error executing command: {' '.join(command)}")
print(f"##vso[task.logissue type=error]Error message: {str(e)}")
return ""
except UnicodeDecodeError as e:
print(f"##vso[task.logissue type=error]Unicode decode error: {e}")
return ""
def get_commit_list(start_date: str, end_date: str) -> list:
"""Retrieve a list of commit hashes within the specified date range"""
return run_command(
[
"git",
"log",
"main",
"--since={}".format(start_date),
"--until={}".format(end_date),
"--pretty=format:%H",
"--",
]
+ base_paths
).splitlines()
def get_commit_modified_files(commit_hash: str) -> list:
"""Get a list of modified files in the commit, along with their status"""
return run_command(
["git", "diff-tree", "-M", "--no-commit-id", "--name-status", "-r", commit_hash, "--"] + base_paths
).splitlines()
def get_commit_meta(commit_hash: str) -> str:
"""Get commit meta information formatted easier for parsing
Date: YYYY-MM-DD Author: <Name Last Name> Message:
"""
return run_command(
["git", "show", "-s", "--format=Commit-Date:%ci Commit-Author:%cn Commit-Message:%s", commit_hash]
)
def get_commit_modified_file_diff(commit_hash: str, file: str) -> str:
"""Get the modified file's diff"""
return run_command(["git", "--no-pager", "show", "--pretty=" "", "--unified=-1", commit_hash, "--", file])
def process_diff(diff_output: str, modified_file: str) -> list:
keywords = []
if modified_file.startswith("detections") and modified_file.endswith("_sentinel.json"):
keywords = [
{"regex": r'\s*?(?:\+|\-)\s*?"query":', "desc": "Query updated"},
{"regex": r'\s*?(?:\+|\-)\s*?"(?:name|displayName)":', "desc": "Name updated"},
{"regex": r'\s*?(?:\+|\-)\s*?"id":', "desc": "ID updated"},
{"regex": r'\s*?(?:\+|\-)\s*?"description"', "desc": "Description updated"},
{
"regex": r'\s*?(?:\+|\-)\s*?"severity"',
"desc": "Severity changed",
"old_value_regex": r'\s*?\-\s*?(?:"severity"):\s*"?([A-Za-z]+)"?\s*',
"new_value_regex": r'\s*?\+\s*?(?:"severity"):\s*"?([A-Za-z]+)"?\s*',
},
]
if modified_file.startswith("detections") and modified_file.endswith("_meta.yml"):
keywords = [
{"regex": r"\s*?(?:\+|\-)\s*?title\s*:", "desc": "Name updated"},
{"regex": r"\s*?(?:\+|\-)\s*?id\s*:", "desc": "ID updated"},
{"regex": r"\s*?(?:\+|\-)\s*?description\s*:", "desc": "Description updated"},
{
"regex": r"\s*?(?:\+|\-)\s*?level\s*:",
"desc": "Severity changed",
"old_value_regex": r'\s*?\-\s*?level:\s*"?([A-Za-z]+)"?\s*',
"new_value_regex": r'\s*?\+\s*?level:\s*"?([A-Za-z]+)"?\s*',
},
]
if modified_file.startswith("content_packs"):
keywords = [
{"regex": r'\s*?(?:\+|\-)\s*?"name":', "desc": "Name updated"},
{"regex": r'\s*?(?:\+|\-)\s*?"version":', "desc": "Version updated"},
{"regex": r'\s*?(?:\+|\-)\s*?"description"', "desc": "Description updated"},
{
"regex": r'\s*\-\s*"',
"desc": "Detection(s) removed",
"value_regex": r'\s*\-\s*"(.*?)"',
},
{"regex": r'\s*\+\s*"', "desc": "Detection(s) added", "value_regex": r'\s*\+\s*"(.*?)"'},
]
changes = []
for criteria in keywords:
if re.search(criteria["regex"], diff_output):
desc = criteria["desc"]
if "old_value_regex" in criteria and "new_value_regex" in criteria:
old_value_matches = re.findall(criteria["old_value_regex"], diff_output)
new_value_matches = re.findall(criteria["new_value_regex"], diff_output)
old_value = ""
new_value = ""
if len(old_value_matches) > 0:
old_value = old_value_matches[0]
if len(new_value_matches) > 0:
new_value = new_value_matches[0]
desc = f"{desc}: {old_value} -> {new_value}"
if "value_regex" in criteria:
value_matches = re.findall(criteria["value_regex"], diff_output)
if len(value_matches) > 0:
desc = f"{desc}: {", ".join(value_matches)}"
changes.append(desc)
if changes == []:
changes.append("Misc metadata updated")
# Addition and removal of a use case in the content pack for the same commit is just moving it around the list.
if modified_file.startswith("content_packs"):
pack_changes = []
detections_added = []
detections_removed = []
for change in changes:
if change.startswith("Detection(s) added"):
detections_added = re.findall(
r"Detection\(s\) added:\s+(.*$)",
change,
)[
0
].split(",")
elif change.startswith("Detection(s) removed"):
detections_removed = re.findall(
r"Detection\(s\) removed:\s+(.*$)",
change,
)[
0
].split(",")
else:
pack_changes.append(change)
# Keep only unique
detections_added_final = list(set(detections_added) - set(detections_removed))
detections_removed_final = list(set(detections_removed) - set(detections_added))
# Add them in the final change list
if len(detections_added_final) > 0:
pack_changes.append(f"Detection(s) added: {', '.join(detections_added_final)}")
if len(detections_removed_final) > 0:
pack_changes.append(f"Detection(s) removed: {', '.join(detections_removed_final)}")
changes = pack_changes
return changes
def get_change_log(commit_list: list) -> list:
"""
Returns a flattened change log list.
[
{
"Datetime": "YYYY-MM-DD hh:mm:ss",
"Commit": "<commit_hash>",
"Message": "Commit Message",
"Author": "Stamatis Chatzimangou",
"File": "detections/cloud/azure/azure_ad_azure_discovery_using_offensive_tools",
"Filename": "azure_ad_azure_discovery_using_offensive_tools",
"Filepath": "detections",
"Status": "<Created|Modified|Deleted|Renamed|Moved>",
"Change": "Change 1 description"
},
...
]
"""
change_log = []
for commit_index, commit_hash in enumerate(commit_list, start=1):
print(f"Processing commit {commit_index}/{len(commit_list)}: {commit_hash}")
# Retrieve the commit date, author and message
date = ""
author = ""
message = ""
commit_meta = get_commit_meta(commit_hash)
print(f"Commit meta for commit {commit_hash}: {commit_meta}")
matches = re.findall(
r"Commit-Date:(.*?)\s+Commit-Author:(.*?)\s+Commit-Message:(.*)",
commit_meta,
)
if matches:
date, author, message = matches[0]
print(f"Commit date: {date}")
print(f"Commit author: {author}")
print(f"Commit message: {message}")
# Get a list of modified files in the commit, along with their status
modified_files_output = get_commit_modified_files(commit_hash)
print(f"Modified files: {modified_files_output}")
for line in modified_files_output:
parts = line.split("\t")
status, modified_file = parts[0], parts[1]
modified_file_name = os.path.basename(modified_file)
modified_file_path = os.path.dirname(modified_file)
print(f"Checking file: {modified_file} (status: {status})")
# Translate git status code
if status == "A":
status_label = "Created"
elif status == "M":
status_label = "Modified"
elif status == "D":
status_label = "Deleted"
elif status.startswith("R"):
status_label = "Renamed"
else:
status_label = status
# Track changes if modified
diff_changes = []
if status_label == "Modified":
diff_output = get_commit_modified_file_diff(commit_hash, modified_file)
diff_changes = process_diff(diff_output, modified_file)
if status_label == "Renamed":
renamed_file = parts[2]
renamed_file_name = os.path.basename(renamed_file)
renamed_file_path = os.path.dirname(renamed_file)
if renamed_file_name == modified_file_name and renamed_file_path != modified_file_path:
status_label = "Moved"
diff_changes.append(f"Moved to {renamed_file_path} ({status.lstrip('R')} certainty)")
else:
diff_changes.append(f"Renamed to {renamed_file} ({status.lstrip('R')} certainty)")
# If no changes recorded, still append with empty string
if diff_changes:
for change in diff_changes:
change_log.append(
{
"Datetime": date,
"Commit": commit_hash,
"Message": message,
"Author": author,
"File": modified_file,
"Filename": modified_file_name,
"Filepath": modified_file_path,
"Status": status_label,
"Change": change,
}
)
else:
change_log.append(
{
"Datetime": date,
"Commit": commit_hash,
"Message": message,
"Author": author,
"File": modified_file,
"Filename": modified_file_name,
"Filepath": modified_file_path,
"Status": status_label,
"Change": "",
}
)
print(f"Change Log:\n {change_log}")
return change_log
def change_log_to_md(change_log: dict, repo_url: str, bucket="month") -> str:
"""
Generate a markdown changelog grouped by day, week, month, or quarter from the change_log.
Args:
change_log (list): List of dicts with keys including 'Datetime', 'Filename', 'Status', 'Change'
bucket (str): One of 'day', 'month', or 'quarter'
Returns:
str: Markdown-formatted changelog
"""
md_lines = []
dt_format = "%Y-%m-%d %H:%M:%S %z"
target_tz = pytz.timezone("UTC")
status_order = ["Created", "Modified", "Deleted", "Renamed", "Moved"]
category_order = ["Detections", "Content Packs", "Other"]
repo_url_commit = repo_url + "/commit/"
# Parse datetime and normalize to target timezone
for item in change_log:
dt = datetime.strptime(item["Datetime"], dt_format)
dt = dt.astimezone(target_tz)
item["ParsedDatetime"] = dt
# Group changelog into buckets
buckets = defaultdict(list)
for entry in change_log:
dt = entry["ParsedDatetime"]
if bucket == "day":
key = dt.replace(hour=0, minute=0, second=0, microsecond=0)
elif bucket == "week":
# Align to Monday (weekday()=0) – adjust if you want Sunday start
start_of_week = dt - timedelta(days=dt.weekday())
key = start_of_week.replace(hour=0, minute=0, second=0, microsecond=0)
elif bucket == "month":
key = dt.replace(day=1, hour=0, minute=0, second=0, microsecond=0)
elif bucket == "quarter":
quarter_month = 3 * ((dt.month - 1) // 3) + 1
key = dt.replace(month=quarter_month, day=1, hour=0, minute=0, second=0, microsecond=0)
else:
raise ValueError("Bucket must be one of: 'day', 'week', 'month', or 'quarter'")
buckets[key].append(entry)
sorted_buckets = sorted(buckets.items(), key=lambda x: x[0], reverse=True)
# Prepare TOC lines
toc_lines = ["# Table of Contents", ""]
for bucket_start, _ in sorted_buckets:
if bucket == "day":
bucket_end = bucket_start + timedelta(days=1)
elif bucket == "week":
bucket_end = bucket_start + timedelta(days=7)
elif bucket == "month":
if bucket_start.month == 12:
bucket_end = bucket_start.replace(year=bucket_start.year + 1, month=1)
else:
bucket_end = bucket_start.replace(month=bucket_start.month + 1)
elif bucket == "quarter":
if bucket_start.month in [10, 11, 12]:
bucket_end = bucket_start.replace(year=bucket_start.year + 1, month=1)
else:
bucket_end = bucket_start.replace(month=bucket_start.month + 3)
else:
raise ValueError("Bucket must be one of: 'day', 'week', 'month', or 'quarter'")
# Format dates as YYYY-MM-DD for TOC
start_str = bucket_start.strftime("%Y-%m-%d")
end_str = bucket_end.strftime("%Y-%m-%d")
header_text = f"{end_str} ({start_str} -> {end_str})"
# TOC line as markdown link
toc_lines.append(f"- [{header_text}](#{end_str})")
md_lines.extend(toc_lines)
md_lines.append("") # blank line after TOC
# Now add detailed changelog sections
for bucket_start, entries in sorted_buckets:
if bucket == "day":
bucket_end = bucket_start + timedelta(days=1)
elif bucket == "week":
bucket_end = bucket_start + timedelta(days=7)
elif bucket == "month":
if bucket_start.month == 12:
bucket_end = bucket_start.replace(year=bucket_start.year + 1, month=1)
else:
bucket_end = bucket_start.replace(month=bucket_start.month + 1)
elif bucket == "quarter":
if bucket_start.month in [10, 11, 12]:
bucket_end = bucket_start.replace(year=bucket_start.year + 1, month=1)
else:
bucket_end = bucket_start.replace(month=bucket_start.month + 3)
# Header for each bucket (date only)
start_str = bucket_start.strftime("%Y-%m-%d")
end_str = bucket_end.strftime("%Y-%m-%d")
header_text = f"{end_str}"
md_lines.append(f"# {header_text}")
md_lines.append(f"Changes from **{start_str}** to **{end_str}**.")
grouped = defaultdict(lambda: defaultdict(list)) # status -> category -> entries
# Group by Status → Category → Filename
for entry in entries:
status = entry["Status"]
path = entry["File"]
if "detections" in path:
category = "Detections"
elif "content_packs" in path:
category = "Content Packs"
else:
category = "Other"
grouped[category][status].append(entry)
for category in sorted(grouped):
summary_message = []
for status in sorted(grouped[category]):
summary_message.append(
f"**{len(grouped[category][status])}** {category.lower()} have been {status.lower()}"
)
md_lines.append(f"A total of {', '.join(summary_message)}.")
md_lines.append(" ")
for category in category_order:
if category not in grouped:
continue
md_lines.append(f"## {category}")
for status in status_order:
items = grouped[category].get(status, [])
if not items:
continue
files = defaultdict(list)
for entry in items:
files[entry["Filename"]].append(entry)
md_lines.append("<details>")
md_lines.append(f"<summary><b>{status} ({len(files.keys())})</b></summary>")
md_lines.append("")
for filename, file_changes in files.items():
times = sorted(ch["ParsedDatetime"] for ch in file_changes)
t_range = times[0].strftime("%Y-%m-%d %H:%M:%S %z")
if t_range != times[-1].strftime("%Y-%m-%d %H:%M:%S %z"):
t_range = f"{times[0].strftime('%Y-%m-%d %H:%M:%S %z')} → {times[-1].strftime('%Y-%m-%d %H:%M:%S %z')}"
# authors = ','.join(list(set([ch['Author'] for ch in file_changes])))
change_details = []
for ch in file_changes:
commit_url = repo_url_commit + ch["Commit"] + "?path=/" + ch["File"]
if ch["Status"] in ["Modified", "Renamed", "Moved"]:
change_details.append(
", ".join(
[
f"{ch['Change']}",
f"[{ch['Commit']}]({commit_url})",
f"_{ch['Author']}_: _{ch['Message']}_",
]
)
)
else:
change_details.append(
", ".join([f"[{ch['Commit']}]({commit_url})", f"_{ch['Author']}_: _{ch['Message']}_"])
)
md_lines.append(f"- **{filename}** - ({t_range})")
for change_detail in sorted(change_details):
md_lines.append(f" - {change_detail}")
md_lines.append("</details>")
md_lines.append("") # Blank line after category
md_lines.append("") # Blank line after status
return "\n".join(md_lines)
def generate_change_log(repo_url: str, start_date: str, end_date: str, output_directory: str, bucket: str) -> None:
# Get commit list
commit_list = get_commit_list(start_date=start_date, end_date=end_date)
print(f"Found {len(commit_list)} commits to process.")
# Get change log for commits
print("Generating change log...")
change_log = get_change_log(commit_list=commit_list)
import json
print(json.dumps(change_log, indent=4))
# Dump changelog to MD
print("Generating MD file...")
md_content = change_log_to_md(change_log=change_log, repo_url=repo_url, bucket=bucket)
# Save MD file
changelog_md_path = os.path.join(output_directory, "_changelog.md")
with open(changelog_md_path, "w", encoding="utf-8") as f:
f.write(md_content)
def main():
parser = argparse.ArgumentParser(description="Process Git commits for specific changes.")
parser.add_argument("--repo-url", type=str, required=True, help="Specify the repository url")
parser.add_argument(
"--output-directory",
type=str,
required=True,
help="Specify the output directory.",
)
parser.add_argument(
"--bucket",
type=str,
default="month",
choices=["day", "week", "month", "quarter"],
help="Specify the output directory.",
)
parser.add_argument("--start-date", type=str, default="2025-01-01", help="Start date in YYYY-MM-DD format.")
parser.add_argument(
"--end-date", type=str, default=datetime.today().strftime("%Y-%m-%d"), help="End date in YYYY-MM-DD format."
)
args = parser.parse_args()
generate_change_log(
repo_url=args.repo_url,
start_date=args.start_date,
end_date=args.end_date,
output_directory=args.output_directory,
bucket=args.bucket,
)
if __name__ == "__main__":
main()
Python
This pipeline is used to automate the generation of the change log and is triggered by changes in the main branch, specifically targeting files in the detections and content_packs directories. It executes a the Python script above to generate a markdown file for the change log and commits it to the repository. The process ensures that the change log is consistently updated without manual intervention, maintaining up-to-date documentation.
name: Change Log Generation
trigger:
branches:
include:
- "main"
paths:
include:
- "detections/*"
- "content_packs/*"
exclude:
- ".git*"
jobs:
- job: ChangeLogGeneration
displayName: "Change Log Generation"
condition: eq(variables['Build.SourceBranchName'], 'main')
steps:
- checkout: self
fetchDepth: 0
persistCredentials: true
- script: |
git fetch origin main
git checkout -b main origin/main
displayName: "Fetch Branches Locally"
- script: |
python pipelines/scripts/change_log.py --repo-url https://dev.azure.com/<ORGANIZATION_NAME>/DAC/_git/Detection-as-Code --output-directory documentation/
displayName: 'Run Change Log Generation'
- bash: |
git config --global user.email "[email protected]"
git config --global user.name "Azure DevOps Pipeline"
[[ $(git status --porcelain) ]] || { echo "No changes, exiting now..."; exit 0; }
echo "Changes detected, let's commit!"
git pull origin $(Build.SourceBranchName)
git checkout $(Build.SourceBranchName)
git add documentation/
git commit --allow-empty -m "Automated commit from pipeline - Update Change Log"
git push origin $(Build.SourceBranchName)
displayName: Commit Change Log to Repo
YAML
By automating the generation of documentation and change log, we can ensure that all team members are going to have access to the latest information on the detection library changes. We used tools like Jinja for template-based conversion, Git for tracking changes to the repo and regular expressions to parse the output. We also set up pipelines for automated updates of the documentation, reducing manual effort and enhancing collaboration across teams.
The next part of this blog series is going to be about applying versioning schemas in the detection library.
[1] https://learn.microsoft.com/en-us/azure/devops/project/wiki/publish-repo-to-wiki?view=azure-devops&tabs=browser
[2] https://jinja.palletsprojects.com/en/stable/
[3] https://pypi.org/project/Jinja2/
[4] https://git-scm.com/docs/git
[5] https://www.regular-expressions.info/quickstart.html
Stamatis Chatzimangou
Stamatis is a member of the Threat Detection Engineering team at NVISO’s CSIRT & SOC and is mainly involved in Use Case research and development.