Detection Engineering: Practicing Detection-as-Code – Documentation – Part 4
文章介绍了如何利用Jinja模板将YAML和JSON文件转换为Markdown格式,并通过Azure DevOps管道自动化生成检测文档。同时,借助Git命令和Python脚本自动生成变更日志,并将其发布到Wiki中,确保团队能够及时了解检测库的变化。 2025-8-26 08:0:0 Author: blog.nviso.eu(查看原文) 阅读量:15 收藏


Most engineers find documentation unappealing because it often feels like a chore that takes away time from more engaging tasks that solve technical problems and can feel tedious when we are focused on delivering functionality or fixing urgent issues. Regardless, sufficiently documenting our detections is essential in detection engineering as it provides context around the purpose, detection logic, and expected behaviour of each detection rule. Documentation also helps other teams, like SOC analysts, understand what a detection is looking for, how it was built, and how to respond when it fires.

Just as important as documenting individual detections is tracking how the overall detection library evolves. In most cases, the detection library is dynamic and evolves over time to include new content packs and detections as well as modifications to existing ones. These changes impact how the environment is monitored, so it’s crucial to keep other teams informed. In that sense we need to figure out a way to generate a type of change log for our detection library.

That said, let’s look at how we can tackle those issues in the repository we talked about in Part 2.

Detection Wiki

To automatically generate wiki pages documenting our detections, we can convert the YAML metadata files into Markdown format. YAML is already easily readable but when communicating our work with other teams we should use a more user friendly format. Markdown is also the supported format for Azure DevOps wiki pages [1]. To perform the conversion we are going to utilize a Jinja [2], which is a templating engine with special placeholders in the template that allow writing code similar to Python. Data is then passed to the template to render the final document.

The Jinja template that we are going to use to convert the meta files to markdown is the following:

# {{ data.name }}

**ID:** {{ data.id }}

**Level:** {{ data.level }}

**Version:** {{ data.version }}

## Description
{{ data.description }}

## Data Sources
{% for data_source in data.data_sources %}
### {{ data_source.vendor }} {{ data_source.product }} {{ data_source.service }}
- **Category:** {{ data_source.category }}
- **Vendor:** {{ data_source.vendor }}
- **Product:** {{ data_source.product }}
- **Service:** {{ data_source.service }}
- **Event ID:** {{ data_source.event_id }}
{% endfor %}

## References
{% for ref in data.references %}
- {{ ref }}
{% endfor %}

## Blindspots
{% for blindspot in data.blindspots %}
- {{ blindspot }}
{% endfor %}

## Known False Positives
{% for fp in data.known_false_positives %}
- {{ fp }}
{% endfor %}

## Investigation Steps
{% for step in data.investigation_steps %}
- {{ step }}
{% endfor %}

## Tags
{% for tag in data.tags %}
- {{ tag }}
{% endfor %}

Markdown

We will also convert each platform that we support to markdown format. As an example we will provide the Jinja template for Sentinel rules.

# {{ data.properties.displayName }}

**ID:** {{ data.id }}

**Severity:** {{ data.properties.severity }}

**Kind**: {{ data.kind }}

**Enabled**: {{ data.properties.enabled }}

**Query:**
```
{{data.properties.query}}
```

### General Configuration
- **Query Frequency**: {{ data.properties.queryFrequency }}
- **Query Period**: {{ data.properties.queryPeriod }}
- **Trigger Operator**: {{ data.properties.triggerOperator }}
- **Trigger Threshold**: {{ data.properties.triggerThreshold }}
- **Suppression Enabled**: {{ data.properties.suppressionEnabled }}
- **Suppression Duration**: {{ data.properties.suppressionDuration }}
- **Event Grouping Settings Aggregation Kind**: {{ data.properties.eventGroupingSettings.aggregationKind }}

### Incident Configuration
- **Create Incident**: {{ data.properties.incidentConfiguration.createIncident }}
- **Grouping Enabled**: {{ data.properties.incidentConfiguration.groupingConfiguration.enabled }}
- **Reopen Closed Incident**: {{ data.properties.incidentConfiguration.groupingConfiguration.reopenClosedIncident }}
- **Lookback Duration**: {{ data.properties.incidentConfiguration.groupingConfiguration.lookbackDuration }}
- **Matching Method**: {{ data.properties.incidentConfiguration.groupingConfiguration.matchingMethod }}

**Group By Entities**:
{% if data.properties.incidentConfiguration.groupingConfiguration.groupByEntities %}
{% for item in data.properties.incidentConfiguration.groupingConfiguration.groupByEntities %}
- {{ item }}
{% endfor %}
{% else %}
- _None_
{% endif %}

**Group By Alert Details**:
{% if data.properties.incidentConfiguration.groupingConfiguration.groupByAlertDetails %}
{% for item in data.properties.incidentConfiguration.groupingConfiguration.groupByAlertDetails %}
- {{ item }}
{% endfor %}
{% else %}
- _None_
{% endif %}

**Group By Custom Details**:
{% if data.properties.incidentConfiguration.groupingConfiguration.groupByCustomDetails %}
{% for item in data.properties.incidentConfiguration.groupingConfiguration.groupByCustomDetails %}
- {{ item }}
{% endfor %}
{% else %}
- _None_
{% endif %}

Markdown

Processing the detections with Python and the Jinja2 library [3] is simple. For each matching file, we load the data and render it with the appropriate template (meta.jinja or sentinel.jinja), and then save the output in the specified directory. The described logic is implemented in the following python script that we call documentation.py.

import os
import yaml
import json
import argparse
from jinja2 import Environment, FileSystemLoader


def generate_docs(base_dir, output_dir):
    env = Environment(loader=FileSystemLoader("pipelines/scripts/templates"))
    for root, _, files in os.walk(base_dir):
        for file in files:
            if file.endswith("_meta.yml"):
                meta_path = os.path.join(root, file)
                try:
                    with open(meta_path, "r") as f:
                        data = yaml.safe_load(f)

                    template = env.get_template("meta.jinja")
                    md_content = template.render(data=data)

                    base_name = file.replace("_meta.yml", ".md")
                    md_path = os.path.join(output_dir, base_name)
                    with open(md_path, "w") as f:
                        f.write(md_content)

                    print(f"Generated: {md_path}")
                except Exception as e:
                    print(f"##vso[task.logissue type=error]Error processing {meta_path}: {e}")

            if file.endswith("_sentinel.json"):
                sentinel_rule_path = os.path.join(root, file)
                try:
                    with open(sentinel_rule_path, "r") as f:
                        data = json.load(f)

                    template = env.get_template("sentinel.jinja")
                    md_content = template.render(data=data)

                    base_name = file.replace("_sentinel.json", "")
                    subpage_dir_path = os.path.join(output_dir, base_name)
                    if not os.path.exists(subpage_dir_path):
                        os.mkdir(subpage_dir_path)
                    sentinel_path = os.path.join(subpage_dir_path, "sentinel.md")
                    with open(sentinel_path, "w") as f:
                        f.write(md_content)

                    print(f"Generated: {md_path}")
                except Exception as e:
                    print(f"##vso[task.logissue type=error]Error processing {meta_path}: {e}")


def main():
    parser = argparse.ArgumentParser(description="Convert detection files to markdown docs.")
    parser.add_argument("--file-dir", required=True, help="Root directory to scan for files.")
    parser.add_argument("--output-dir", required=True, help="Directory to store output md files")

    args = parser.parse_args()
    generate_docs(args.file_dir, args.output_dir)


if __name__ == "__main__":
    main()

Python

The next step is to automatically generate the MD files upon a pull requests merge to the main branch. To achieve that we need to create a pipeline with a trigger on the main branch for all files under the detection/ directory. We will save the generated md files under a new directory called documentation, which must also be added under the repository structure validation that we described on Part 3. The pipeline will first run our documentation.py script and then commit all new, modified and deleted files to the main branch.

name: MD Files Generation

trigger:
  branches:
    include:
    - "main"
  paths:
    include:
    - "detections/*"
    exclude:
    - ".git*"
    - "*.md"

jobs:
- job: MDFileGeneration
  displayName: "MD File Generation"
  condition: eq(variables['Build.SourceBranchName'], 'main')
  steps:
    - checkout: self
      persistCredentials: true
    - script: |
        python pipelines/scripts/documentation.py --file-dir detections/ --output-dir documentation/
      displayName: 'Run MD File Generation'
    - bash: |
        git config --global user.email "[email protected]"
        git config --global user.name "Azure DevOps Pipeline"
        [[ $(git status --porcelain) ]] || { echo "No changes, exiting now..."; exit 0; }
        echo "Changes detected, let's commit!"
        git pull origin $(Build.SourceBranchName)
        git checkout $(Build.SourceBranchName)
        git add documentation/
        git commit --allow-empty  -m "Automated commit from pipeline - Update MD Files"
        git push origin $(Build.SourceBranchName)
      displayName: Commit MD Files to Repo

YAML

In order for us however to issue a commit from the pipeline we need to provide the “Contribute” permission to the Azure DevOps’s project Build Service account. We can do that by navigating to the Project Settings -> Repositories -> Selecting our Repository -> Clicking on Security -> Selecting the main branch -> Selecting the DAC Build Service Account and assigning the Contribute permission.

A sample output of the pipeline run is the following:

As a last step, in order to publish the MD files as a wiki in Azure DevOps navigate to the Wiki pages from the left panel. Click on the project’s Wiki dropdown box and select Publish Code as wiki.

On the popup right side panel fill in the required fields as so and click on Publish.

We click on Publish and end up with a Wiki for detections that is searchable and can be shared with other teams.

Change Log

The documentation is a good first step towards communicating the work done within the detection engineering team. However, in the rapidly changing field of cybersecurity, frequent updates necessitate a more effective method for communicating changes to the detection library. While Git has version control and Git platforms do provide a way to view the history of changes, it is only possible to view them ad-hoc without any summarization and are not very usable by people who are not familiar with those tools.

Luckily since we are leveraging Git we can iterate through the commit history ourselves and put together a Python script that will generate a change log for us with the summarized changes over a period of time.

But first we’ll go through some Git commands [4] to understand their output and how it will help us in our script.

The following Git command retrieves a list of commit hashes from the main branch, between January 1, 2025 and August 12, 2025, focusing on changes made to .json and .yml files in the detections directory as well as .json files in the content_packs directory. The output is formatted to display only the commit hashes.

git log main --since=2025-01-01 --until=2025-08-12 --pretty=format:%H -- detections/*.json content_packs/*.json detections/*.yml

Bash

Once we have the commit list we can iterate through it and identify the added, modified, deleted, or renamed files. The following Git command lists the changes on json and yaml files within the detections and content_packs directories, made in a specific commit, identified by the hash. The output of the command is a list of status and file pairs.

git diff-tree -M --no-commit-id --name-status -r 17ff5c0963132ea22a9fba9f8a33359cd08b097f -- detections/*.json content_packs/*.json detections/*.yml

Bash

For each commit we also collect information on the date, author and commit message.

git show -s --format="Commit-Date:%ci Commit-Author:%cn Commit-Message:%s" 17ff5c0963132ea22a9fba9f8a33359cd08b097f

Bash

Finally, the following Git command displays the changes made to a specific file in a given commit, without using a pager. It also suppresses commit header information to focus solely on the file changes.

git --no-pager show --pretty="" --unified=-1 17ff5c0963132ea22a9fba9f8a33359cd08b097f -- content_packs/cloud_o365.json

Bash

git --no-pager show --pretty="" --unified=-1 17ff5c0963132ea22a9fba9f8a33359cd08b097f -- detections/cloud/azure/azure_ad_azure_discovery_using_offensive_tools/azure_ad_azure_discovery_using_offensive_tools_sentinel.json

Bash

Depending on the Git client version you are using you might need to verify that the output is as displayed in the screenshots.

The diff output can then be parsed using regular expressions [5] to identify actual changes in the code files, rather than just identifying file modifications. By doing so, we end up with an intermediate change log list that includes all the information gathered by running the Git commands above as well as the actual changes in the modified files.

Parsing the changes in the files using regular expressions is easier if multiline and beautified files are used.

We iterate through the change log list and convert it to Markdown format summarizing the changes in time buckets by file type (content pack or detection), and change type (added, deleted, modified, renamed or moved). Detection engineers can use that markdown change log to convey their efforts to other teams. That way, changes to the detection library are much easier digestible by non-detection engineering members since they do not have to manually go through the git history and look trough the changes. A sample output of our script is provided below:

The commit hashes in the output MD file are clickable and can be used to navigate to the actual changes in the repository.

The code of the script is provided below.

import argparse
import os
import re
import subprocess
import pytz
from datetime import datetime, timedelta
from collections import defaultdict

base_paths = ["detections/*.json", "content_packs/*.json", "detections/*.yml"]


def run_command(command: list) -> str:
    """Executes a shell command and returns the output."""
    try:
        # print(f"[R] Running command: {' '.join(command)}")
        output = subprocess.check_output(command, text=True, encoding="utf-8", errors="replace").strip()
        # print(f"[O] Command output:\n{'\n'.join(['\t'+line for line in output.splitlines()])}")
        return output
    except subprocess.CalledProcessError as e:
        print(f"##vso[task.logissue type=error]Error executing command: {' '.join(command)}")
        print(f"##vso[task.logissue type=error]Error message: {str(e)}")
        return ""
    except UnicodeDecodeError as e:
        print(f"##vso[task.logissue type=error]Unicode decode error: {e}")
        return ""


def get_commit_list(start_date: str, end_date: str) -> list:
    """Retrieve a list of commit hashes within the specified date range"""
    return run_command(
        [
            "git",
            "log",
            "main",
            "--since={}".format(start_date),
            "--until={}".format(end_date),
            "--pretty=format:%H",
            "--",
        ]
        + base_paths
    ).splitlines()


def get_commit_modified_files(commit_hash: str) -> list:
    """Get a list of modified files in the commit, along with their status"""
    return run_command(
        ["git", "diff-tree", "-M", "--no-commit-id", "--name-status", "-r", commit_hash, "--"] + base_paths
    ).splitlines()


def get_commit_meta(commit_hash: str) -> str:
    """Get commit meta information formatted easier for parsing
    Date: YYYY-MM-DD Author: <Name Last Name> Message:
    """
    return run_command(
        ["git", "show", "-s", "--format=Commit-Date:%ci Commit-Author:%cn Commit-Message:%s", commit_hash]
    )


def get_commit_modified_file_diff(commit_hash: str, file: str) -> str:
    """Get the modified file's diff"""
    return run_command(["git", "--no-pager", "show", "--pretty=" "", "--unified=-1", commit_hash, "--", file])


def process_diff(diff_output: str, modified_file: str) -> list:
    keywords = []
    if modified_file.startswith("detections") and modified_file.endswith("_sentinel.json"):
        keywords = [
            {"regex": r'\s*?(?:\+|\-)\s*?"query":', "desc": "Query updated"},
            {"regex": r'\s*?(?:\+|\-)\s*?"(?:name|displayName)":', "desc": "Name updated"},
            {"regex": r'\s*?(?:\+|\-)\s*?"id":', "desc": "ID updated"},
            {"regex": r'\s*?(?:\+|\-)\s*?"description"', "desc": "Description updated"},
            {
                "regex": r'\s*?(?:\+|\-)\s*?"severity"',
                "desc": "Severity changed",
                "old_value_regex": r'\s*?\-\s*?(?:"severity"):\s*"?([A-Za-z]+)"?\s*',
                "new_value_regex": r'\s*?\+\s*?(?:"severity"):\s*"?([A-Za-z]+)"?\s*',
            },
        ]
    if modified_file.startswith("detections") and modified_file.endswith("_meta.yml"):
        keywords = [
            {"regex": r"\s*?(?:\+|\-)\s*?title\s*:", "desc": "Name updated"},
            {"regex": r"\s*?(?:\+|\-)\s*?id\s*:", "desc": "ID updated"},
            {"regex": r"\s*?(?:\+|\-)\s*?description\s*:", "desc": "Description updated"},
            {
                "regex": r"\s*?(?:\+|\-)\s*?level\s*:",
                "desc": "Severity changed",
                "old_value_regex": r'\s*?\-\s*?level:\s*"?([A-Za-z]+)"?\s*',
                "new_value_regex": r'\s*?\+\s*?level:\s*"?([A-Za-z]+)"?\s*',
            },
        ]
    if modified_file.startswith("content_packs"):
        keywords = [
            {"regex": r'\s*?(?:\+|\-)\s*?"name":', "desc": "Name updated"},
            {"regex": r'\s*?(?:\+|\-)\s*?"version":', "desc": "Version updated"},
            {"regex": r'\s*?(?:\+|\-)\s*?"description"', "desc": "Description updated"},
            {
                "regex": r'\s*\-\s*"',
                "desc": "Detection(s) removed",
                "value_regex": r'\s*\-\s*"(.*?)"',
            },
            {"regex": r'\s*\+\s*"', "desc": "Detection(s) added", "value_regex": r'\s*\+\s*"(.*?)"'},
        ]

    changes = []
    for criteria in keywords:
        if re.search(criteria["regex"], diff_output):
            desc = criteria["desc"]
            if "old_value_regex" in criteria and "new_value_regex" in criteria:
                old_value_matches = re.findall(criteria["old_value_regex"], diff_output)
                new_value_matches = re.findall(criteria["new_value_regex"], diff_output)
                old_value = ""
                new_value = ""
                if len(old_value_matches) > 0:
                    old_value = old_value_matches[0]
                if len(new_value_matches) > 0:
                    new_value = new_value_matches[0]
                desc = f"{desc}: {old_value} -> {new_value}"
            if "value_regex" in criteria:
                value_matches = re.findall(criteria["value_regex"], diff_output)
                if len(value_matches) > 0:
                    desc = f"{desc}: {", ".join(value_matches)}"
            changes.append(desc)

    if changes == []:
        changes.append("Misc metadata updated")

    # Addition and removal of a use case in the content pack for the same commit is just moving it around the list.
    if modified_file.startswith("content_packs"):
        pack_changes = []
        detections_added = []
        detections_removed = []
        for change in changes:
            if change.startswith("Detection(s) added"):
                detections_added = re.findall(
                    r"Detection\(s\) added:\s+(.*$)",
                    change,
                )[
                    0
                ].split(",")
            elif change.startswith("Detection(s) removed"):
                detections_removed = re.findall(
                    r"Detection\(s\) removed:\s+(.*$)",
                    change,
                )[
                    0
                ].split(",")
            else:
                pack_changes.append(change)

        # Keep only unique
        detections_added_final = list(set(detections_added) - set(detections_removed))
        detections_removed_final = list(set(detections_removed) - set(detections_added))

        # Add them in the final change list
        if len(detections_added_final) > 0:
            pack_changes.append(f"Detection(s) added: {', '.join(detections_added_final)}")
        if len(detections_removed_final) > 0:
            pack_changes.append(f"Detection(s) removed: {', '.join(detections_removed_final)}")

        changes = pack_changes

    return changes


def get_change_log(commit_list: list) -> list:
    """
    Returns a flattened change log list.
    [
        {
            "Datetime": "YYYY-MM-DD hh:mm:ss",
            "Commit": "<commit_hash>",
            "Message": "Commit Message",
            "Author": "Stamatis Chatzimangou",
            "File":  "detections/cloud/azure/azure_ad_azure_discovery_using_offensive_tools",
            "Filename": "azure_ad_azure_discovery_using_offensive_tools",
            "Filepath": "detections",
            "Status": "<Created|Modified|Deleted|Renamed|Moved>",
            "Change": "Change 1 description"
        },
        ...
    ]
    """
    change_log = []

    for commit_index, commit_hash in enumerate(commit_list, start=1):
        print(f"Processing commit {commit_index}/{len(commit_list)}: {commit_hash}")

        # Retrieve the commit date, author and message
        date = ""
        author = ""
        message = ""
        commit_meta = get_commit_meta(commit_hash)
        print(f"Commit meta for commit {commit_hash}: {commit_meta}")
        matches = re.findall(
            r"Commit-Date:(.*?)\s+Commit-Author:(.*?)\s+Commit-Message:(.*)",
            commit_meta,
        )
        if matches:
            date, author, message = matches[0]
        print(f"Commit date: {date}")
        print(f"Commit author: {author}")
        print(f"Commit message: {message}")

        # Get a list of modified files in the commit, along with their status
        modified_files_output = get_commit_modified_files(commit_hash)
        print(f"Modified files: {modified_files_output}")

        for line in modified_files_output:
            parts = line.split("\t")
            status, modified_file = parts[0], parts[1]
            modified_file_name = os.path.basename(modified_file)
            modified_file_path = os.path.dirname(modified_file)

            print(f"Checking file: {modified_file} (status: {status})")

            # Translate git status code
            if status == "A":
                status_label = "Created"
            elif status == "M":
                status_label = "Modified"
            elif status == "D":
                status_label = "Deleted"
            elif status.startswith("R"):
                status_label = "Renamed"
            else:
                status_label = status

            # Track changes if modified
            diff_changes = []
            if status_label == "Modified":
                diff_output = get_commit_modified_file_diff(commit_hash, modified_file)
                diff_changes = process_diff(diff_output, modified_file)

            if status_label == "Renamed":
                renamed_file = parts[2]
                renamed_file_name = os.path.basename(renamed_file)
                renamed_file_path = os.path.dirname(renamed_file)

                if renamed_file_name == modified_file_name and renamed_file_path != modified_file_path:
                    status_label = "Moved"
                    diff_changes.append(f"Moved to {renamed_file_path} ({status.lstrip('R')} certainty)")
                else:
                    diff_changes.append(f"Renamed to {renamed_file} ({status.lstrip('R')} certainty)")

            # If no changes recorded, still append with empty string
            if diff_changes:
                for change in diff_changes:
                    change_log.append(
                        {
                            "Datetime": date,
                            "Commit": commit_hash,
                            "Message": message,
                            "Author": author,
                            "File": modified_file,
                            "Filename": modified_file_name,
                            "Filepath": modified_file_path,
                            "Status": status_label,
                            "Change": change,
                        }
                    )
            else:
                change_log.append(
                    {
                        "Datetime": date,
                        "Commit": commit_hash,
                        "Message": message,
                        "Author": author,
                        "File": modified_file,
                        "Filename": modified_file_name,
                        "Filepath": modified_file_path,
                        "Status": status_label,
                        "Change": "",
                    }
                )

    print(f"Change Log:\n {change_log}")
    return change_log


def change_log_to_md(change_log: dict, repo_url: str, bucket="month") -> str:
    """
    Generate a markdown changelog grouped by day, week, month, or quarter from the change_log.

    Args:
        change_log (list): List of dicts with keys including 'Datetime', 'Filename', 'Status', 'Change'
        bucket (str): One of 'day', 'month', or 'quarter'

    Returns:
        str: Markdown-formatted changelog
    """
    md_lines = []
    dt_format = "%Y-%m-%d %H:%M:%S %z"
    target_tz = pytz.timezone("UTC")
    status_order = ["Created", "Modified", "Deleted", "Renamed", "Moved"]
    category_order = ["Detections", "Content Packs", "Other"]
    repo_url_commit = repo_url + "/commit/"

    # Parse datetime and normalize to target timezone
    for item in change_log:
        dt = datetime.strptime(item["Datetime"], dt_format)
        dt = dt.astimezone(target_tz)
        item["ParsedDatetime"] = dt

    # Group changelog into buckets
    buckets = defaultdict(list)
    for entry in change_log:
        dt = entry["ParsedDatetime"]

        if bucket == "day":
            key = dt.replace(hour=0, minute=0, second=0, microsecond=0)
        elif bucket == "week":
            # Align to Monday (weekday()=0) – adjust if you want Sunday start
            start_of_week = dt - timedelta(days=dt.weekday())
            key = start_of_week.replace(hour=0, minute=0, second=0, microsecond=0)
        elif bucket == "month":
            key = dt.replace(day=1, hour=0, minute=0, second=0, microsecond=0)
        elif bucket == "quarter":
            quarter_month = 3 * ((dt.month - 1) // 3) + 1
            key = dt.replace(month=quarter_month, day=1, hour=0, minute=0, second=0, microsecond=0)
        else:
            raise ValueError("Bucket must be one of: 'day', 'week', 'month', or 'quarter'")

        buckets[key].append(entry)

    sorted_buckets = sorted(buckets.items(), key=lambda x: x[0], reverse=True)

    # Prepare TOC lines
    toc_lines = ["# Table of Contents", ""]

    for bucket_start, _ in sorted_buckets:
        if bucket == "day":
            bucket_end = bucket_start + timedelta(days=1)
        elif bucket == "week":
            bucket_end = bucket_start + timedelta(days=7)
        elif bucket == "month":
            if bucket_start.month == 12:
                bucket_end = bucket_start.replace(year=bucket_start.year + 1, month=1)
            else:
                bucket_end = bucket_start.replace(month=bucket_start.month + 1)
        elif bucket == "quarter":
            if bucket_start.month in [10, 11, 12]:
                bucket_end = bucket_start.replace(year=bucket_start.year + 1, month=1)
            else:
                bucket_end = bucket_start.replace(month=bucket_start.month + 3)
        else:
            raise ValueError("Bucket must be one of: 'day', 'week', 'month', or 'quarter'")

        # Format dates as YYYY-MM-DD for TOC
        start_str = bucket_start.strftime("%Y-%m-%d")
        end_str = bucket_end.strftime("%Y-%m-%d")
        header_text = f"{end_str} ({start_str} -> {end_str})"

        # TOC line as markdown link
        toc_lines.append(f"- [{header_text}](#{end_str})")

    md_lines.extend(toc_lines)
    md_lines.append("")  # blank line after TOC

    # Now add detailed changelog sections
    for bucket_start, entries in sorted_buckets:
        if bucket == "day":
            bucket_end = bucket_start + timedelta(days=1)
        elif bucket == "week":
            bucket_end = bucket_start + timedelta(days=7)
        elif bucket == "month":
            if bucket_start.month == 12:
                bucket_end = bucket_start.replace(year=bucket_start.year + 1, month=1)
            else:
                bucket_end = bucket_start.replace(month=bucket_start.month + 1)
        elif bucket == "quarter":
            if bucket_start.month in [10, 11, 12]:
                bucket_end = bucket_start.replace(year=bucket_start.year + 1, month=1)
            else:
                bucket_end = bucket_start.replace(month=bucket_start.month + 3)

        # Header for each bucket (date only)
        start_str = bucket_start.strftime("%Y-%m-%d")
        end_str = bucket_end.strftime("%Y-%m-%d")
        header_text = f"{end_str}"
        md_lines.append(f"# {header_text}")
        md_lines.append(f"Changes from **{start_str}** to **{end_str}**.")

        grouped = defaultdict(lambda: defaultdict(list))  # status -> category -> entries

        # Group by Status → Category → Filename
        for entry in entries:
            status = entry["Status"]
            path = entry["File"]
            if "detections" in path:
                category = "Detections"
            elif "content_packs" in path:
                category = "Content Packs"
            else:
                category = "Other"

            grouped[category][status].append(entry)

        for category in sorted(grouped):
            summary_message = []
            for status in sorted(grouped[category]):
                summary_message.append(
                    f"**{len(grouped[category][status])}** {category.lower()} have been {status.lower()}"
                )
            md_lines.append(f"A total of {', '.join(summary_message)}.")
        md_lines.append(" ")

        for category in category_order:
            if category not in grouped:
                continue
            md_lines.append(f"## {category}")
            for status in status_order:
                items = grouped[category].get(status, [])
                if not items:
                    continue
                files = defaultdict(list)

                for entry in items:
                    files[entry["Filename"]].append(entry)

                md_lines.append("<details>")
                md_lines.append(f"<summary><b>{status} ({len(files.keys())})</b></summary>")
                md_lines.append("")
                for filename, file_changes in files.items():
                    times = sorted(ch["ParsedDatetime"] for ch in file_changes)
                    t_range = times[0].strftime("%Y-%m-%d %H:%M:%S %z")
                    if t_range != times[-1].strftime("%Y-%m-%d %H:%M:%S %z"):
                        t_range = f"{times[0].strftime('%Y-%m-%d %H:%M:%S %z')}{times[-1].strftime('%Y-%m-%d %H:%M:%S %z')}"
                    # authors = ','.join(list(set([ch['Author'] for ch in file_changes])))
                    change_details = []
                    for ch in file_changes:
                        commit_url = repo_url_commit + ch["Commit"] + "?path=/" + ch["File"]
                        if ch["Status"] in ["Modified", "Renamed", "Moved"]:
                            change_details.append(
                                ", ".join(
                                    [
                                        f"{ch['Change']}",
                                        f"[{ch['Commit']}]({commit_url})",
                                        f"_{ch['Author']}_: _{ch['Message']}_",
                                    ]
                                )
                            )
                        else:
                            change_details.append(
                                ", ".join([f"[{ch['Commit']}]({commit_url})", f"_{ch['Author']}_: _{ch['Message']}_"])
                            )

                    md_lines.append(f"- **{filename}** - ({t_range})")
                    for change_detail in sorted(change_details):
                        md_lines.append(f"      - {change_detail}")
                md_lines.append("</details>")
                md_lines.append("")  # Blank line after category
            md_lines.append("")  # Blank line after status

    return "\n".join(md_lines)


def generate_change_log(repo_url: str, start_date: str, end_date: str, output_directory: str, bucket: str) -> None:
    # Get commit list
    commit_list = get_commit_list(start_date=start_date, end_date=end_date)
    print(f"Found {len(commit_list)} commits to process.")

    # Get change log for commits
    print("Generating change log...")
    change_log = get_change_log(commit_list=commit_list)
    import json

    print(json.dumps(change_log, indent=4))

    # Dump changelog to MD
    print("Generating MD file...")
    md_content = change_log_to_md(change_log=change_log, repo_url=repo_url, bucket=bucket)

    # Save MD file
    changelog_md_path = os.path.join(output_directory, "_changelog.md")
    with open(changelog_md_path, "w", encoding="utf-8") as f:
        f.write(md_content)


def main():
    parser = argparse.ArgumentParser(description="Process Git commits for specific changes.")
    parser.add_argument("--repo-url", type=str, required=True, help="Specify the repository url")
    parser.add_argument(
        "--output-directory",
        type=str,
        required=True,
        help="Specify the output directory.",
    )
    parser.add_argument(
        "--bucket",
        type=str,
        default="month",
        choices=["day", "week", "month", "quarter"],
        help="Specify the output directory.",
    )
    parser.add_argument("--start-date", type=str, default="2025-01-01", help="Start date in YYYY-MM-DD format.")
    parser.add_argument(
        "--end-date", type=str, default=datetime.today().strftime("%Y-%m-%d"), help="End date in YYYY-MM-DD format."
    )

    args = parser.parse_args()
    generate_change_log(
        repo_url=args.repo_url,
        start_date=args.start_date,
        end_date=args.end_date,
        output_directory=args.output_directory,
        bucket=args.bucket,
    )


if __name__ == "__main__":
    main()

Python

This pipeline is used to automate the generation of the change log and is triggered by changes in the main branch, specifically targeting files in the detections and content_packs directories. It executes a the Python script above to generate a markdown file for the change log and commits it to the repository. The process ensures that the change log is consistently updated without manual intervention, maintaining up-to-date documentation.

name: Change Log Generation

trigger:
  branches:
    include:
    - "main"
  paths:
    include:
    - "detections/*"
    - "content_packs/*"
    exclude:
    - ".git*"

jobs:
- job: ChangeLogGeneration
  displayName: "Change Log Generation"
  condition: eq(variables['Build.SourceBranchName'], 'main')
  steps:
    - checkout: self
      fetchDepth: 0
      persistCredentials: true
    - script: |
        git fetch origin main
        git checkout -b main origin/main
      displayName: "Fetch Branches Locally"
    - script: |
        python pipelines/scripts/change_log.py --repo-url https://dev.azure.com/<ORGANIZATION_NAME>/DAC/_git/Detection-as-Code --output-directory documentation/
      displayName: 'Run Change Log Generation'
    - bash: |
        git config --global user.email "[email protected]"
        git config --global user.name "Azure DevOps Pipeline"
        [[ $(git status --porcelain) ]] || { echo "No changes, exiting now..."; exit 0; }
        echo "Changes detected, let's commit!"
        git pull origin $(Build.SourceBranchName)
        git checkout $(Build.SourceBranchName)
        git add documentation/
        git commit --allow-empty  -m "Automated commit from pipeline - Update Change Log"
        git push origin $(Build.SourceBranchName)
      displayName: Commit Change Log to Repo

YAML

Wrapping Up

By automating the generation of documentation and change log, we can ensure that all team members are going to have access to the latest information on the detection library changes. We used tools like Jinja for template-based conversion, Git for tracking changes to the repo and regular expressions to parse the output. We also set up pipelines for automated updates of the documentation, reducing manual effort and enhancing collaboration across teams.

The next part of this blog series is going to be about applying versioning schemas in the detection library.

References

[1] https://learn.microsoft.com/en-us/azure/devops/project/wiki/publish-repo-to-wiki?view=azure-devops&tabs=browser
[2] https://jinja.palletsprojects.com/en/stable/
[3] https://pypi.org/project/Jinja2/
[4] https://git-scm.com/docs/git
[5] https://www.regular-expressions.info/quickstart.html

About the Author

schat-avatar

Stamatis Chatzimangou

Stamatis is a member of the Threat Detection Engineering team at NVISO’s CSIRT & SOC and is mainly involved in Use Case research and development.


文章来源: https://blog.nviso.eu/2025/08/26/detection-engineering-practicing-detection-as-code-documentation-part-4/
如有侵权请联系:admin#unsafe.sh