Hack The Box Academy Writeup — PASSWORD ATTACKS — Writing Custom Wordlists and Rules

Hack The Box Academy Writeup — PASSWORD ATTACKS — Writing Custom Wordlists and Rules
文章介绍了一种利用OSINT（公开来源情报）和工具（如CeWL和Hashcat）生成定制密码列表的方法，用于解决Hack The Box模块中的密码攻击练习。通过收集目标个人信息、生成初始词典、组合扩展和应用规则变换，最终创建符合密码策略的候选列表，并使用Hashcat尝试破解目标哈希值。 2025-8-1 04:15:25 Author: infosecwriteups.com(查看原文) 阅读量:18 收藏

Zoom image will be displayed

Hack The Box Rabbit Hole

The purpose of this write‑up is to bridge the gap between the theoretical concepts presented in the Hack The Box module and the concrete logical steps required to actually solve its exercises (in some cases). Rather than handing out answers, it will guide readers through a series of small, incremental reasoning steps, each one reinforcing the module’s core lessons in a way that by the end, students not only arrive at the correct solution but truly understand why each step was necessary. In doing so, this write‑up aims to enhance the module’s value for the entire community by making its underlying logic explicit and accessible.

Zoom image will be displayed

Source: https://academy.hackthebox.com/course/preview/password-attacks

In this module, we received only a quick overview of the CeWL tool , how to crawl for custom wordlists and craft rule sets for later use. The exercise itself expects us to dive deeper: gathering our own OSINT to further practice with additional commands than the ones displayed, organizing it into meaningful wordlist seeds, and methodically applying transformations until we generate a list guaranteed to crack the target hash.

In the following section, we’ll break down that process step by step.

To kick things off, we’ll leverage CeWL’s automatic web‑crawling power (in this case it’s overkill but we’re building muscle memory here) to produce our initial wordlist , saving us from manually combing through page content. First, we’ll take the OSINT details from the prompt and embed them into a simple mark.html file, serve it locally as if it were a website, and then point CeWL at it. This approach mimics crawling a real web page while ensuring all of Mark’s personal data is captured in our base list.

To ensure our custom wordlist is relevant and effective, we should carefully select which details to include in the HTML file used by CeWL. The goal is to capture meaningful, personal keywords, not to blindly paste the entire exercise prompt.

For example, although the prompt includes information about the company’s password policy (e.g. minimum length, required characters), this is not useful as raw wordlist material. Including it would clutter the output with non-personal, generic terms like “uppercase”, “symbol”, or “number”, which aren’t likely to appear in actual passwords.

Instead, we extract only specific, personal data points about the target, such as names, dates, locations, interests, and family members. These are far more likely to appear in real-world password combinations.

Following this logic, we construct a clean HTML page that includes only the relevant OSINT about Mark White. This page is then served locally and scraped using CeWL to build the base of our targeted wordlist.

Cd to the directory of mark.html and initiate an http.server:

python3 -m http.server 8000
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
10.10.14.248 - - [29/Jul/2025 13:17:52] "GET /mark.html HTTP/1.1" 200 -

mark.html

<!DOCTYPE html>
<html>
<head><title>Mark White Bio</title></head>
<body>
  <h1>Mark White</h1>
  <ul>
    <li>Born: August 5, 1998</li>
    <li>Works at Nexura, Ltd.</li>
    <li>Lives in San Francisco, CA, USA</li>
    <li>Pet: cat named Bella</li>
    <li>Wife: Maria</li>
    <li>Son: Alex</li>
    <li>Fan of baseball</li>
  </ul>
</body>
</html>

With our HTML page in place, we’re ready to generate the initial wordlist using CeWL. Since the page is flat and simple, we don’t need deep spidering or complex options.

We use the following command:

cewl -d 1 -m 2 --with-numbers -w mark_initial.txt http://<tun0>:8000/mark.html

Explanation of options:

-d 1: The spider depth is set to 1, as our local HTML file doesn't link to any other pages.
-m 2: We include words with a minimum length of 2 characters to catch short but potentially useful entries like CA or US.
--with-numbers: Enables the inclusion of numeric values, such as Mark’s birth year (1998), which CeWL would otherwise ignore.

The output is written to mark_initial.txt. To get a quick idea of how many unique words were captured, we can count the lines in the file:

wc -l mark_initial.txt
27

At this stage, we incorporate the password policy details we previously noted. Since the minimum password length is 12 characters, it’s reasonable to assume the user may have combined two or more personal elements into a single password.

To reflect this in our wordlist, we generate all possible pairwise combinations from the initial keywords. This is done using Hashcat’s combinator attack mode (-a 1), which joins each word with every other word (including itself).

We run the following command:

hashcat --stdout \
  -a 1 \
  mark_initial.txt \
  mark_initial.txt \
  > mark_pairs.txt

This creates a new file, mark_pairs.txt, containing all 2-word combinations. To monitor the growth of our list, we count how many entries were generated:

wc -l mark_pairs.txt
729

These combinations significantly increase our chances of matching the target password, especially under a policy that requires longer passwords.

Before applying more computationally intensive rule-based mutations (again this list may not be an issue in this context, but we’re building muscle memory), we can optimize our wordlist by filtering out any entries that don’t meet the minimum password length of 12 characters.

To do this efficiently, we use awk to retain only those combinations that are 12 characters or longer:

awk 'length($0) >= 12' mark_pairs.txt > pairs_length12.txt

Then we check how many entries remain:

wc -l pairs_length12.txt
76

This reduces our list from 729 entries to just 76, helping us conserve time and processing power in the next steps.

With our filtered list of 12+ character entries in place, we’re ready to apply advanced transformations using custom rules , as introduced in this module.

We begin by creating a custom.rule file in our working directory using a heredoc:

cat << 'EOF' > custom.rule
:
c
so0
c so0
sa@
c sa@
c sa@ so0
$!
$! c
$! so0
$! sa@
$! c so0
$! c sa@
$! so0 sa@
$! c so0 sa@
EOF

To confirm it was written correctly:

custom.rule

This rule set covers a range of transformations, including:

Capitalizing the first letter (c)
Substituting a with @ (sa@)
Substituting o with 0 (so0)
Appending ! to the end ($!)
Various combinations of these

Now, we process our filtered wordlist using these custom rules:

hashcat --stdout \
  -r custom.rule \
  pairs_length12.txt \
> mut_mark_final.txt

Finally, we check how many transformed password candidates were generated:

cat mut_mark_final.txt | wc -l
1140

This step expands our list to 1140 entries — each of which is policy-compliant and tailored to our target.

With our final wordlist (mut_mark_final.txt) prepared, the last step is to attempt cracking the target hash provided in the exercise.

Identifying the Hash Type

To determine the correct Hashcat mode, we first analyze the hash format using hashid:

hashid -m 97268a8ae45ac7d15c3cea4ce6ea550b

[+] MD2
[+] MD5 [Hashcat Mode: 0]
[+] MD4 [Hashcat Mode: 900]
[+] Double MD5 [Hashcat Mode: 2600]
...

Among the candidates, MD5 is the most likely match.

To validate this assumption, we generate a known MD5 hash and compare the length:

echo -n "test" | md5sum
098f6bcd4621d373cade4e832627b4f6

The length and structure match the target hash, increasing our confidence that this is indeed a standard MD5 hash (Hashcat Mode 0).

Cracking the Hash

Now we run Hashcat using mode 0 and our final custom wordlist:

hashcat -a 0 -m 0 97268a8ae45ac7d15c3cea4ce6ea550b /home/pentestacc/Server/medium/mut_mark_final.txt

-a 0: straight wordlist attack mode
-m 0: specifies MD5 hash mode
The final wordlist path points to our custom file

After the process completes, if the hash was successfully cracked, Hashcat will display the recovered password on screen.

In this exercise, we followed a structured and realistic approach to targeted password cracking by leveraging OSINT, automation, and logic. Starting from publicly available personal details, we used tools like CeWL to generate a custom wordlist, then refined and expanded it through combinations and rule-based transformations.

By filtering based on password policy and applying a final cracking attempt with Hashcat, we successfully demonstrated how well-crafted wordlists can be highly effective in a real-world scenario. This process highlights the value of combining technical tools with strategic thinking during penetration testing.

Ready to level up your hacking skills?

Join Hack The Box — the ultimate platform to learn penetration testing and cybersecurity hands-on.

👉 Start hacking here and get access to real-world labs, challenges, and career-boosting skills.

I’m Ilias Mavropoulos, a penetration tester at TP with certifications including PenTest+, eWPT, eJPT, SC‑200, Gold BTL1, ISC2 CC, and an MSc in Cybersecurity.

Based in the Athens area, Greece, I bring over a decade of experience in IT and Cybersecurity environments combined.

LinkedIn:
https://www.linkedin.com/in/imavropoulos

Hope you enjoyed the write‑up!
If you found it helpful, feel free to share it with others in the community.

文章来源: https://infosecwriteups.com/htb-password-cracking-wordlist-osint-6a5e01b60638?source=rss----7b722bfd1b8d---4
如有侵权请联系:admin#unsafe.sh