TL;DR — I built an automation that cloned and scanned tens of thousands of public GitHub repos for leaked secrets. For each repository I restored deleted files, found dangling blobs and unpacked .pack files to search in them for exposed API keys, tokens, and credentials. Ended up reporting a bunch of leaks and pulled in around $64k from bug bounties 🔥.
- Background
- Git internals
- Collecting Targets
- Building the Automation
- Findings & Payments
- Summary
My name is Sharon Brizinov, and while I usually focus on low-level vulnerability and exploitation research in OT/IoT devices, I occasionally dive into bug bounty hunting.
Many researchers in the bugbounty space look for leaked secrets, often scanning GitHub repositories for exposed credentials. This approach isn’t new, but I wanted to explore a different angle-recovering secrets from allegedly deleted files. Developers often forget that Git history retains everything, even after files are removed from the working directory.
To test this, I scanned tens of thousands of repositories from thousands of companies, searching for sensitive information hidden in past commits. The results were alarming-I discovered numerous deleted files containing API tokens, credentials, and even active session tokens that had not been revoked. Reporting these findings led to significant security improvements for the affected companies, and in the end, I earned a total of $64,350 in bug bounty rewards.
In this blog I will try to describe my journey from collecting all the Github repositories, building the automation, and finding & reporting on leaking secrets.
First thing first, I highly recommend reading How git Internally works by Octobot. It’s a great easy-to-read resource to understand better git.
Git is a distributed version control system that tracks changes in files and allows multiple developers to collaborate on a project. It maintains a complete history of all changes, enabling users to revert to previous states, branch off for feature development, and merge changes efficiently. At its core, Git operates as a content-addressable filesystem where each version of a file is stored as a unique object in a repository.
Everything Git tracks (files, folders, commits, etc.) is stored as an object, identified by its SHA-1 or SHA-256 hash (depending on config).
There are four object types in Git:
- Blob → File content
- Tree → Directory structure
- Commit → Snapshot + metadata
- Tag → Annotated tag object
Git Blobs and Packs
A blob (binary large object) is how Git stores the content of a file, without any filename or path info. When Git first stores an object, it writes it as a “loose object”, like this:
.git/objects/ab/cdef1234567890...
Where ab is the first 2 hex characters of the SHA and cdef1234567890… is the rest. The binary data stored inside is a zlib-compressed data that corresponds to a single file.
To optimize storage and performance, Git periodically (loose objects ≥ 6700 by defualt) compresses many loose objects into a pack file:
.git/objects/pack/pack-<hash>.pack
.pack files are complex beasts with very interesting structure. Luckily we don’t really need to fully understand how it is constructed, since we can use git-unpack-objects to unpack any .pack file.
Sometimes unreferenced objects (aka dangling objects) will be created in our git repository. These are valid but unreferenced objects (commits, blobs, trees, or tags) that remain in the .git/objects/
directory but are no longer reachable from any branch, tag, stash, or reflog. They are typically created when history is rewritten (for example by doing git commit --amend
, rebase
, reset
, or branch deletion), leaving old objects behind. Although not part of the active history, Git retains them temporarily (by default, 2 weeks) for potential recovery. You can find them using git fsck --dangling
Git Commit History
Each commit in Git represents a snapshot of the repository at a given point in time. Commits are immutable and are identified by a SHA-1/SHA-256 hash. They store:
- A reference to a tree object, which represents the file structure at that commit.
- Pointers to parent commits, forming a directed acyclic graph (DAG) of repository history.
- Metadata, including the author, timestamp, and commit message.
Git stores commits efficiently using delta compression, meaning it only records changes rather than full copies of files.
Deleted Files and Why They Can Be Recovered
When a file is deleted using git rm or simply removed from the working directory and committed, it disappears from the latest snapshot but still exists in the repository’s history. This happens because:
- Git Commit History is Immutable: Once a commit is created, its data is stored in .git/objects and remains there even if it’s no longer referenced by any branch or tag. Unreferenced (dangling) objects aren’t removed immediately — they’re typically retained for around two weeks before being eligible for garbage collection.
- References (Refs) Keep Objects Alive: Git maintains references in the heads, tags and remotes directories, so even if a file is removed in a later commit, older commits (=trees) still contain the file.
To completely remove a file from history, one must rewrite history using tools like git filter-branch, git-filter-repo or by manually rebasing and running garbage collector (with prune) to clear unreachable objects — good luck with that. However, if the repository is public, the file may already be copied or cloned elsewhere so finding and revoking leaking API keys, tokens, secrets and sessions is highly important.
To understand these concepts better I built a small platform to view and analyze file-directory changes in a git repository to visualize what objects are created, changed and deleted. Obviously this was an overkill for this project, but with vibe-coding it took me < 5 minutes so why not :)
The juice — so how do we get all deleted files?:
- Restored deleted files by diffing parent-child commits.
- Unpack all .pack files using git unpack-objects < .git/objects/pack/pack-<SHA>.pack
- Find dangling objects using git fsck — full — unreachable — dangling
To collect all deleted files I traversed all commits and for each commit I compared (git diff) the list of files with its parent commit. If there was a difference and some files were deleted (marked by D) I restored the files (git show) and dumped them to the disk. Obviously this isn’t the best way nor efficient method to use but it was good enough. Here’s a small PoC script that does exactly that:
#!/bin/bash# cd to cloned repo
mkdir -p "__ANALYSIS/del"
# Extract all commits and process each commit
git rev-list --all | while read -r commit; do
echo "Processing commit: $commit"
# Get the parent commit
parent_commit=$(git log --pretty=format:"%P" -n 1 "$commit")
if [ -z "$parent_commit" ]; then
continue
fi
parent_commit=$(echo "$parent_commit" | awk '{print $1}')
# Get the diff for the commit
git diff --name-status "$parent_commit" "$commit" | while read -r file_status file; do
# Replace / with _ for filenames in binary_files_dir
safe_file_name=$(echo "$file" | sed 's/\//_/g')
# Handle deleted files
if [ "$file_status" = "D" ]; then
# Handle binary files
echo "Binary file deleted: $file" | tee -a "__ANALYSIS/del.log"
echo "Saving to __ANALYSIS/del/${commit}___${safe_file_name}"
git show "$parent_commit:$file" > "__ANALYSIS/del/${safe_file_name}"
fi
done
done
And here’s the one liner I used to extract all unreachable blobs:
mkdir -p unreachable_blobs && git fsck --unreachable --dangling --no-reflogs --full - | grep 'unreachable blob' | awk '{print $3}' | while read h; do git cat-file -p "$h" > "unreachable_blobs/$h.blob"; done
Now that I had a working PoC to restore deleted files, the next step was to gather as many relevant GitHub repositories as possible. What qualifies as relevant? Any company with an active bug bounty program, of course. I compiled a list of company names from all publicly known bug bounty programs, as well as private ones I had been invited to over the years. For the public BB programs I mainly used these sources:
- https://hackerone.com/bug-bounty-programs
- https://www.bugcrowd.com/bug-bounty-list/
- https://github.com/Lissy93/bug-bounties
- https://github.com/trickest/inventory
- https://github.com/disclose/bug-bounty-platforms
- https://github.com/projectdiscovery/public-bugbounty-programs
In addition, I thought it could be interesting to get all the Github accounts (mostly organizations) that have at least one Github repository with more than 5000 stars. I ended up with this one-liner:
for page in {1..100}; do gh api "search/repositories?q=stars:>5000&sort=stars&order=desc&per_page=50&page=$page" --jq '.items[].full_name'; done | cut -d '/' -f 1
Organizing the accounts
I made a huge list of all the company names and saved it as companies.txt. Next I needed to find their public Github account. I thought of different way to achieve that, but eventually chose the most lazy & stupid way — “AI”. I just sent batches of company names onto grok, perplexity and OpenAI and nicely asked them to “search online and find the github account that is associated with the following organizations”. It wasn’t perfect and I had to fix some of the accounts they made up but overall it worked pretty well.
One important observation I made early on is that many companies maintain multiple GitHub accounts. For example, they may have separate accounts for their main engineering team, research, QA/testing, community, and more. Each of these can hold valuable information, and developers from across the organization may unintentionally leak sensitive data across any of these accounts. As a result, I made sure to specifically look for accounts with names like ‘lab’, ‘research’, ‘test’, ‘qa’, ‘samples’, ‘hq’, and ‘community’ as well.
Here’s the exact prompt I used:
I have a task for you. I will provide a list of companies, and your job will be to search the internet for all GitHub accounts associated with those companies. This includes official, affiliated, open-source, and community-related repositories. Please provide only the GitHub URLs in a clean, bullet-point list. Is that clear?
More please
Additionally, I reviewed all the GitHub accounts and repositories I collected to identify those that were forks of other projects. For each forked repository, I traced it back to the original source and added the associated GitHub accounts to my monitoring list. My reasoning was that if Vendor-A created a project that was forked by other companies (that were on my list), it’s possible that Vendor-A has other repositories — potentially containing leaked secrets — that could impact the companies that forked one of their other projects.
In total, I had thousands of Github organization accounts and it was time to build the automation.
The automation was quite simple, it needed to clone all the projects from all organizations, restore all deleted files and search for secrets that are still active. Here’s a pseudo code:
- foreach company in companies:
- foreach repo in comapny.repos:
- restore all deleted files
- foreach file in files:
- collect secrets
- foreach secret in secrets:
- is secret active?
- notify via Telegram bot
Overall there were six steps I needed to implement including Preparing machines, Cloning all repos, Restoring all deleted files, Search for hot secrets, Notify on verified secrets, Delete the repo and move on. Here’s a breakdown of how everything worked together.
1 — Preparing machines
For the entire processing of this project I utilized 10 servers, some are private cloud computes (e.g. EC2), some are VPS, some are physical computers and Raspberry PIs. I made sure that each compute unit had at least 120gb of free hard-disk space. Then, I split the Github organization list into 10 and loaded each part on a different server.
2 — Cloning all repos
To get organization’s entire Github repo list I used gh cli like this:
for REPO_NAME in $(gh repo list $ORG_NAME -L 1000 --json name --jq '.[].name');
do
FULL_REPO_URL="https://github.com/$ORG_NAME/$REPO_NAME.git"
git clone "$FULL_REPO_URL" "$REPO_NAME"
done;
3 — Restoring all deleted files
For each cloned project, I restored all deleted files using three techniques:
- Restoring deleted files by diffing parent-child commits (see bash script above).
- Unpacking all .pack files using git unpack-objects < .git/objects/pack/pack-<SHA>.pack
- Finding dangling objects using git fsck — full — unreachable — dangling
4 — Search for active secrets
Once all the files were restored, I ran TruffleHog which is a great secrets scanning tool that digs deep into the code repository and finds secrets, passwords, and sensitive keys. It supports the detection and verification of secrets for over 800 different key types and overall works very fast. It can also detect secrets in base64-encoded and some compressed data formats which is great for us.
I used Trufflehog with the — only-verified flag to search just for *working and verified* secrets. I also used the filesystem argument to force it search the files on the disk — which consists also the restored deleted files.
trufflehog filesystem --only-verified --print-avg-detector-time --include-detectors="all" ./ > secrets.txt
Another benefit of running Trufflehog on local clones is that it scanned also the .git directory and since it can also decompress and scan zlib-compressed streams most of the loose objects were scanned “for free”. It also scanned the closed .pack files which sometimes surprised me with nice results too.
A question arises — if Trufflehog can decompress and scan git objects why bother with restoring deleted files? because doing so significantly improved the success rate of finding secrets. Sometimes the compressed streams and .pack files were too big for it to handle, sometimes they were heavily mixed in multiple compressions and wrappers and the tools couldn’t yield results when they were scanned in their raw format. By extracting as much as files as I could I reached a much higher success rates by finding leaking secrets.
5— Notify verified secrets
Once Trufflehog found leaking secrets from the repo it was time to notify me over Telegram:
curl -F chat_id="XXXXXXXXXXXXX" \
-F document=@"$ORG_NAME.$REPO_NAME.secrets.txt" \
-F caption="New secerts - $ORG_NAME - $REPO_NAME" \
'https://api.telegram.org/botXXXXXXXXXXXXX:XXXXXXXXXXXXX/sendDocument'
Why Telegram? idk, that’s my automation-notify go-to tool. Building a small backend with a small database would have been much better but who cares :)
6 — Delete the repo and move on
Finally, the automation would delete the repo to save space, and move to the next repository.
So what did I find? hundreds of active secrets leaking — besides real production tokens I also encountered test accounts and even canaries carefully implanted by companies to get notified when someone is trying to use them. Let’s break down everything I found:
Top Interesting Secrets
Here are the top interesting secrets and tokens I found and the range of the associated bounties I received for them. The bounty changed significantly depending on multiple factors such as impact, token scope, and the affected company’s bug-bounty policy and payout table.
- GCP Projects/AWS Production tokens — $5k–$15k 🔥🔥🔥
- Slack tokens — $3k-$10k 🔥🔥
- Github tokens — $5k-$10k 🔥🔥
- OpenAPI tokens — $500–$2000 🔥
- HuggingFace tokens — $500–$2000 🔥
- Algolia Admin tokens — $300–$1000 🔥
- Email SMTP credentials — $500–$1000 🔥
- Platform-specific developer tokens and sessions — $500–$2000 🔥
Non-interesting secrets
While I did find many interesting tokens, some of the active tokens were associated with testing accounts or even canaries. As it turns out there’s a really nice website called CanaryTokens that enables you to generate canary tokens and get notified whenever someone is trying to use them. Very cool idea.
Some of the other non-interesting and repeating tokens were throw away accounts, dummy users for testing, and API tokens with no permissions at all deliberately being used in front-end. Whenever I encountered these I simply ignored them because they had no impact at all.
Let’s see a couple of examples:
- Github dummy user private key: many projects are using test priv keys that involve a dummy Github user. I’ve encountered a couple of them that repeated across hundreds of repositories, for example aaron1234567890123.
- Throw away accounts: accounts that are used as read-only or to bypass rate-limits.
- Web3 API open tokens: I’ve seen hundreds of web3 projects using open API keys to query different blockchains for transactions or crypto prices. The most popular API tokens were: Infura and Alchemy.
- Front-end API keys: some services, like Algolia, expect you to use a specific API token in the frontend. Usually the token is associated with read-only permissions and cannot be modified or affect the application. The problem starts when developers accidentally use a token with more roles and permissions than expected. For example, the Algolia Admin token instead of the Algolia Search token.
- Canary tokens: Some companies use canary tokens to get notifications when they are being used. There are a couple services that help with that, for example CanaryTokens.
Why did the secrets get leaked in the first place?!
After analyzing dozens of real-impact cases I can summarized this question into three explanations— lack of knowledge of how git works, Not fully realising what was committed due to binary files or hidden files, and blindly trusting git rewrite-history tools.
- Lack of knowledge of how git works: sometimes developers don’t understand how git stores information — For example, developers committed plain-text credentials/tokens/secrets. Later they figured out they messed up and simply removed the secrets or deleted the files without revoking the secrets. But git remembers everything and so by reconstructing all deleted files anyone could have found the secrets.
- Not realizing what’s inside: I saw multiple cases that developers didn’t use .gitignore and accidentally committed binary files such .pyc (python complied) files that contained secrets. Later they deleted these files without realizing these files contain binary data mixed with hot secrets. In other cases I saw hidden files (in linux filenames that start with a dot like .env) being committed OR zipped into an archive that was later committed. Even if these files are later removed, the information can still be restored and therefore, the secrets are leaking.
- Trusting git rewrite-history tools: I also noticed one case that developers accidentally committed secrets into their repository. Later they understood the mistake, deleted them and tried to use re-write git history tools to make it all disappear. However, there was still a reference in the .pack file and I was able to restore the secrets. Luckily I reported this responsibly and they were able to fix it without any harm.
Most of the leaked secrets were found in binary files that had been committed to the repository and later deleted. These files are typically generated by compilers or automated processes. A common example is .pyc
files, which are Python byte-code files created when some Python interpreters compile source code. These often end up being committed unintentionally. Other examples include compiler-generated debug files, such as .pdb
files, which are also occasionally committed by mistake.
Overall, this was a really fun project!
I believe that cloning a large number of GitHub repositories locally and extracting / restoring deleted files proved to be an effective strategy for scanning and discovering leaked secrets.
In addition, I gained a much deeper understanding of Git internals and even built several custom tools along the way — one of which, HexShare, is publicly available. During the process, I also reported leaked secrets to multiple Fortune 500 companies and earned approximately $64k in rewards.