Introducing NodeZero® Advanced Data Pilfering: View Your Data Through the Eyes of an Attacker
好的,我现在需要帮用户总结一篇文章的内容,控制在100个字以内。首先,我得仔细阅读文章,理解其主要观点。 文章主要讲的是NodeZero的ADP功能如何利用大语言模型(LLMs)来增强数据安全防御。攻击者通常会花很多时间在数据中寻找凭证和高价值数据,而传统的EDR和数据安全工具在这方面表现不佳。LLMs的优势在于能进行语义理解,帮助发现隐藏的凭证和评估数据风险。 接下来,文章详细介绍了ADP的两个核心功能:高级凭证 pilfering 和数据风险推断。NodeZero通过一个多阶段的数据管道来优化LLM的使用,确保只发送相关或模糊的数据进行分析。同时,文章也强调了数据安全措施,比如使用AWS Bedrock,并提供用户控制选项。 最后,文章通过实际案例展示了ADP在渗透测试中的应用效果,并指出这种技术在现实中的威胁已经存在。 现在我需要将这些要点浓缩到100字以内。重点包括:NodeZero的ADP功能、LLMs的应用、防御攻击者的数据挖掘、分类敏感数据、评估业务风险以及多阶段数据管道和安全措施。 总结起来应该是:NodeZero的ADP功能利用大语言模型识别隐藏凭证、分类敏感数据并评估业务风险,同时通过智能数据管道优化分析过程,并确保数据安全。 </think> 文章介绍NodeZero的Advanced Data Pilfering (ADP)功能如何利用大语言模型(LLMs)识别隐藏凭证、分类敏感数据并评估业务风险。该功能结合传统机器学习和正则表达式,在多阶段数据管道中优化分析过程,并确保数据安全与隐私保护。 2025-11-4 16:0:0 Author: horizon3.ai(查看原文) 阅读量:40 收藏

Sifting through data. After gaining initial access, it’s where attackers spend the bulk of their time. They trawl through files and network data to discover credentials, looking for opportunities for lateral movement and privilege escalation. Simultaneously, they hunt for high-value data like financial information, PCI/PII, or intellectual property to maximize impact and raise the stakes of the compromise.

Defending against this is difficult. EDR solutions rarely flag on an attacker simply reading files, and traditional data security tools fall short. These tools, often based on regular expressions and keyword matching, excel with highly structured data and finding easily identifiable data types like AWS access keys or credit card numbers. But they don’t fare well when faced with the massive volume of unstructured data found in file shares, cloud storage, and collaboration tools.

This is exactly where Large Language Models (LLMs) shine. LLMs move beyond rigid searches to provide true semantic understanding, allowing them to analyze data in context. This unlocks the ability to find hard-to-find credentials and assess the business risk of compromised data in a human-like way that was previously impossible.

In this post, we’ll explore how NodeZero’s new Advanced Data Pilfering (ADP) feature combines LLMs with offensive security techniques to supercharge defenders’ understanding of data at risk.

Introduction

Advanced Data Pilfering (ADP) covers two common attacker behaviors:

  • Advanced Credential Pilfering: Using LLMs to find credentials hidden in files and Active Directory metadata. NodeZero doesn’t just find them – it then validates and uses those credentials to discover new paths for lateral movement and privilege escalation.
  • Data Risk Inference: Using LLMs to automatically classify compromised data (like intellectual property, financial records, or PII) and assess its business risk, showing you precisely what an attacker might go after.

Underpinning both capabilities is a smart data pipeline. It’s not feasible from a cost, performance, or efficiency perspective to send all data in a network to an LLM. NodeZero solves this by using a multi-stage approach to progressively filter and assess data. It combines traditional machine learning (to score files by metadata), embedding models (to understand semantic content), and well-tuned regular expressions (for highly structured data) to ensure that only the most relevant or ambiguous data is sent to the LLM for analysis.

A Word on Data Security

The LLMs used by NodeZero are hosted by AWS Bedrock. By design, NodeZero isolates its usage of Bedrock across clients and individual pentests. AWS guarantees that client data is not shared with model providers, is not used to improve base models, and is not accessible to other customers.

NodeZero’s file embedding model runs locally within the client environment. This model is part of a data pipeline that pre-filters data, and only the data snippets from files identified as relevant are sent to the LLM for analysis.

NodeZero users have full control over ADP and can configure whether data is processed by an LLM and what type of data is included. These controls are broken down by feature:

  • Advanced Credential Pilfering: This feature can be disabled. When disabled, no file contents are sent to an LLM for credential analysis.
  • Data Risk Inference: This feature can be configured in one of three modes:
    • Disabled: No data of any kind is sent to an LLM for risk inference.
    • Metadata Only: Only metadata about files (such as paths and directory structure) is sent to an LLM for analysis.
    • Full Inference: Relevant data snippets and metadata are sent to the LLM for analysis.

Advanced Data Pilfering in Action

Let’s look at how NodeZero leverages ADP during a pentest. We built these scenarios using a modified version of the well known Game Of Active Directory (GOAD) cyber range.

Extracting Passwords from Active Directory Attributes

In a previous writeup, we showed how NodeZero got initial access in the standard GOAD environment by finding the password for samwell.tarly in that user’s Description field in Active Directory:

The contextual clue word “Password” in the description makes it feasible to use regular expressions to pull out the password “Heartsbane”. NodeZero uses well tuned regexes to pull out obvious passwords like this, and these regexes are surprisingly effective in real-world production environments.

But regex has its limit. As an example, let’s say we modify the GOAD setup to put only the password by itself in the Description field like this:

And to go further, let’s set up another user viserys.targaryen in a different domain with their password in their Notes field, enclosed in Spanish text.

These passwords would be nearly impossible to extract using a general-purpose regex, but with Advanced Data Pilfering, NodeZero extracts these passwords and compromises both users, as shown in the attack path below:

NodeZero does the following:

  • Anonymously enumerates domain users and finds the password “Heartsbane” for samwell.tarly from his Description field with the assistance of an LLM.
  • Logs in as samwell.tarly
  • With samwell.tarly‘s access, conducts cross-forest enumeration of users and finds the password “GoldCrown” for viserys.targaryen from his Notes field, again with the assistance of an LLM.
  • Logs in as viserys.targaryen

This LLM-powered analysis isn’t just for Description and Notes. NodeZero also searches for passwords in the adminDescription and comment fields. These fields are editable using the Attribute Editor in Active Directory Users and Computers. By default any domain user can read the values set in these fields.

A Word on False Positives

Finding what looks like a password is one thing; knowing if it’s a valid one is another. This is a problem for human pentesters and tools alike. A discovered string could be an old, reset password, an employee ID, or a password for a different user entirely (password reuse).

NodeZero removes this guesswork. It validates potential credentials by attempting to log in. And if successful, it goes further by abusing the compromised user’s privileges to move laterally, escalate privileges, and access sensitive data. This approach gives defenders a wealth of information and context that can be used to drive effective prioritization and remediation.

Extracting Credentials from Files

NodeZero also leverages LLMs to find credentials hidden within files.

In a previous writeup about GOAD, we wrote about how NodeZero compromised a privileged domain user, jeor.mormont, by extracting his credential from a simple Powershell script file, script.ps1, in the SYSVOL share on a domain controller. That script file happened to be simple enough that a general-purpose regex is sufficient to pull out the credential. The context clue words “user” and “password”, appearing one line after another, are strong anchor words for a regex. And these regexes work reasonably well in real production environments.

Let’s make it harder. Suppose we replace that simple file with a more complex script, backup.ps1:

And, to go further, let’s also place a file called access.txt in a user’s Desktop folder on one of the machines in the range, 192.168.4.22. The contents of this file contain the credential for another domain user, jon.snow:

With Advanced Data Pilfering, NodeZero can extract both credentials and chain them into a full attack, as shown below:

NodeZero does the following:

  • Gets initial access as domain user samwell.tarly
  • Enumerates SYSVOL using samwell.tarly‘s credential and identifies the file backup.ps1 as likely to contain a credential
  • Uses an LLM to extract the credential for jeor.mormont from backup.ps1
  • Logs into to the domain as jeor.mormont, and discovers this user is a a local admin on host 192.168.4.22
  • Deploys a Remote Access Tool (RAT) to the host using the privileges of jeor.mormont.
  • Through the RAT, identifies the file C:\Users\hodor\Desktop\access.txt as likely to contain a credential.
  • Uses an LLM to extract the credential for jon.snow
  • Logs into the domain as jon.snow

NodeZero’s support for LLM-assisted credential discovery isn’t limited to SMB shares and compromised hosts. It extends to other common data repositories, including AWS S3 buckets, NFS shares, and Slack.

Assessing the Business Risk of Compromised Hosts and Shares

Every pentester knows that a good report is more than a list of compromised hosts. What truly matters is conveying the business impact, which often comes down to the type of data that was accessed. Was it sensitive customer data, financial records, or intellectual property? For defenders, this same context drives a better understanding of business risk and which security weaknesses to prioritize for remediation.

With Advanced Data Pilfering, NodeZero now automatically classifies the type of data it compromises and links it to tangible business risks.

For instance, in the GOAD environment, we set up an Engineering file share on the host 192.168.4.23 containing R&D data that includes engineering schematics, source code, and legal patent documentation.

📁 \\192.168.4.23\Engineering
  |
  |-- 📁 Legal_IP
  |   |-- 📁 Patents
  |   |   |-- 📁 Applications_Pending
  |   |   |-- 📁 Issued
  |   |   |-- 📁 Prior_Art_Research
  |   |-- 📁 Trademarks
  |   |-- 📁 Licensing
  |   |   |-- 📁 Inbound_Licenses (IP we use)
  |   |   `-- 📁 Outbound_Licenses (IP we sell)
  |   `-- 📁 Trade_Secrets
  |-- 📁 Product_Development
  |   |-- 📁 Alchemy_Platform (Software)
  |   |   |-- 📁 Architecture
  |   |   |-- 📁 Source_Code_Snapshots
  |   |   `-- 📁 Security_Audits
  |   `-- 📁 Gen4_Sensor (Hardware)
  |       |-- 📁 BOM (Bill of Materials)
  |       |-- 📁 CAD_Schematics
  |       |-- 📁 Firmware
  |       `-- 📁 QA_Test_Results
  |-- 📁 Research_Lab
  |   |-- 📁 Project_Helios (New Battery Tech)
  |   |   |-- 📁 Data
  |   |   |-- 📁 Lab_Notebooks_Scanned
  |   |   |-- 📁 Formulations
  |   `-- 📁 Project_Quantum_Leap (AI/ML)
  |       |-- 📁 Models
  |       |-- 📁 Training_Data
  `-- 📁 Strategy

NodeZero gains access to this share after compromising the user viserys.targaryen, as shown in the attack path below:

Once it has access, NodeZero applies its smart data pipeline. It gathers file metadata and samples key files, sending this data to an LLM for deep analysis.

In this example, NodeZero determined that the share contained Intellectual Property, Manufacturing/Production data, Source Code, and Strategic Business Communications. It then automatically mapped these categories to the following business risks:

  • Theft of IP & R&D
  • Software Delivery Disruption
  • Leak of Sensitive Communications

The LLM-assisted rationale justifies why the data was categorized this way, giving defenders the specific, actionable evidence they need.

Assessing the Business Risk of a Compromised Database

With Advanced Data Pilfering, NodeZero also assesses the risk of compromised databases.

In our GOAD environment, we added a synthetic database for a medical application to one of the Microsoft SQL Servers (192.168.4.22). NodeZero compromised the service account for this server, giving it full control of the database, as shown in the attack path below:

NodeZero then extracted the metadata about the database – tables, columns, record counts, etc – and used an LLM to classify the type of data it contained.

NodeZero correctly infers that the database contains Health Data and Personal Data, as evidenced by the presence of “extensive health data including diagnoses, encounters, lab results, and medications” and “personal data including patient demographics and provider information.” It then links compromise of this database to Regulatory Breach Penalties as a business risk.

Conclusion

The examples in this post illustrate how Advanced Data Pilfering (ADP) enhances NodeZero’s pentesting capabilities, both with credential discovery and bridging the gap between technical exploits and real-world business risk.

These examples are representative of the types of results NodeZero is delivering in real-world tests. For instance, in a recent real-world attack path, NodeZero used ADP to compromise the domain:

NodeZero did the following:

  • Got initial access by password spraying a domain user
  • Compromised a second domain user by discovering its password in an AD attribute
  • Used that second domain user to identify an interesting file on a host it had access to
  • Used Advanced Data Pilfering to extract a domain admin credential from that file
  • Logged into the domain as domain admin

NodeZero would go on to leverage this domain admin access to compromise the client’s Microsoft Entra tenant.

This isn’t just a theoretical threat. Real-world threat actors are doing the exact same thing. In a report from August 2025, Anthropic described a “vibe hacker” using Claude Code to facilitate ransomware attacks against at least 17 different organizations. The attacker used Claude Code to actively find credentials and, most notably, classify and analyze stolen data to weaponize it for extortion, mirroring the two core functions of ADP.

At Horizon3, we believe the future of cyber warfare will be played out at machine speed, algorithm vs algorithm, with humans by exception. If you’re a hacker interested in AI and cybersecurity and creating autonomous production-safe solutions that work at scale with no humans in the loop, we want to hear from you.


文章来源: https://horizon3.ai/intelligence/blogs/introducing-nodezero-advanced-data-pilfering-view-your-data-through-the-eyes-of-an-attacker/
如有侵权请联系:admin#unsafe.sh