One Tool to Rule Them All: File Metadata & Static Analysis for Malware Analysts and SOC Teams
好的,我现在需要帮用户总结一篇文章的内容,控制在100个字以内。用户的要求很明确,不需要特定的开头,直接写文章描述。 首先,我仔细阅读了用户提供的文章内容。这篇文章主要介绍了一个名为Basic File Information Gathering Script的Python工具。这个工具的功能包括提取文件哈希、PE/ELF/Mach-O元数据、字符串、YARA匹配以及进行深度静态分析,而且不需要运行文件。 接下来,我注意到文章详细介绍了工具的安装、使用方法以及各种功能选项。比如,它可以处理单个文件或目录,输出格式包括表格、JSON和CSV。此外,还支持YARA扫描和与MalwareBazaar集成,这对于恶意软件分析非常有用。 考虑到用户要求总结在100字以内,我需要提炼出工具的核心功能和优势。主要功能点包括:提取多种文件信息、支持多种格式的深度分析、静态分析不执行文件、支持批量处理和多种输出格式,以及集成YARA和MalwareBazaar。 最后,我将这些要点整合成一个简洁的句子,确保涵盖所有关键信息,并且不超过字数限制。 </think> 该工具是一个Python CLI脚本,用于提取文件哈希、PE/ELF/Mach-O元数据、字符串及进行深度静态分析。支持批量处理、JSON/CSV输出及YARA扫描,并可与MalwareBazaar集成。无需运行文件即可获取详细静态报告。 2026-4-23 06:1:14 Author: infosecwriteups.com(查看原文) 阅读量:18 收藏

Extract hashes, PE/ELF/Mach-O metadata, strings, YARA hits, and deep static analysis — without ever running the file.

Andrey Pautov

Press enter or click to view image in full size

Introdaction

Whether you’re triaging a suspicious attachment, building a file-intel pipeline, or comparing your analysis to threat feeds, you need one place to get hashes, format-specific metadata, and optional deep static analysis — without decompiling or executing code.

Get Andrey Pautov’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Basic File Information Gathering Script is a Python CLI that does exactly that. It’s built for malware analysts, digital forensics, and SOC engineers who want fast, scriptable file intelligence in table, JSON, or CSV form.

Table of contents

  1. Why another file-info tool?
  2. Two interfaces, one codebase
  3. Installation
  4. Quick start
  5. Hashes and fuzzy hashing
  6. Strings and YARA
  7. Full static analysis ( — full)
  8. Tuning format-specific analysis
  9. Real malware: MalwareBazaar integration
  10. Summary

Why another file-info tool?

file and md5sum tell you type and one hash. Full-blown sandboxes and disassemblers are heavy and often overkill for “what is this file?” and “how does it compare to VirusTotal/MalwareBazaar?”. This tool sits in the middle:

  • Single pass for MD5, SHA-1, SHA-256, SHA-384, SHA-512 (and optional ssdeep/tlsh).
  • Format-aware: PE (Windows), ELF (Linux), Mach-O (macOS) with meaningful fields — timestamps, imphash, entry point, packing heuristics, digital signatures, Rich header, overlay.
  • 60+ magic numbers so you get a real file type, not just “data”.
  • Strings (ASCII + UTF-16 LE), optional YARA scanning, and a full static analysis mode that gives you byte stats, entropy maps, head/tail hex, and pattern extraction (URLs, IPs, paths, registry keys) — no decompilation.

You can run it on one file, a list of files, or recursively over a directory, and get human-readable tables, JSON, or CSV for automation.

Two interfaces, one codebase

The rest of this article focuses on fileinfo.py.

fileinfo.py vs Basic_inf_gathering.py

When to use which

  • Basic_inf_gathering.py — Quick, single-file PE report to the terminal; minimal dependencies (LIEF, optional cryptography).

Press enter or click to view image in full size

  • fileinfo.py — Default choice for batch, automation, JSON/CSV, YARA, strings, full static analysis, and non-PE (ELF/Mach-O).

Installation

git clone https://github.com/anpa1200/Basic-File-Information-Gathering-Script.git
cd Basic-File-Information-Gathering-Script
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt

Optional but recommended for malware work:

pip install ssdeep py-tlsh yara-python
# For PE certificate details:
pip install cryptography
# For OLE/compound doc listing in --full:
pip install olefile

Quick start

Single file (human-readable table):

python3 fileinfo.py /path/to/sample.exe

Press enter or click to view image in full size

Recursive directory (e.g. a drop folder):

python3 fileinfo.py -r /path/to/samples/

Press enter or click to view image in full size

JSON for automation or SIEM:

Press enter or click to view image in full size

python3 fileinfo.py --json /path/to/file.exe -o report.json
  GNU nano 7.2                                                                              report.json                                                                                       
{
"file_name": "malware.exe",
"file_path": "/home/andrey/git_project/Basic-File-Information-Gathering-Script/malware_samples/malware.exe",
"file_size": 563311,
"file_size_human": "563311 bytes (0.54 MB)",
"magic_number": "4D5A9000",
"file_type": "Windows Executable (Extended MZ)",
"entropy": 7.2249,
"entropy_note": "Normal",
"permissions": "-rw-rw-r--",
"hashes": {
"md5": "0b375e6b7e44d7c8488c4227e9344197",
"sha1": "dd8753066efc055dea693f44627fd69c988dfc65",
"sha256": "9fdea40a9872a77335ae3b733a50f4d1e9f8eff193ae84e36fb7e5802c481f72"
},
"pe": {
"timestamp": "2019-10-28 09:44:53 UTC (OK)",
"compiler": "Unknown",
"imphash": "3313409012dcc6b8a34048226776435e",
"header_offset": "264 (0x108)",
"entry_point": "RVA 0xEAF2, VA 0x40EAF2",
"rich_header": "Present (parse error)",
"resources": "147 resource nodes",
"overlay": "137327 bytes (0x2186F)",
"signature": "Not signed",
"packing": "Unpacked"
}
}

CSV for spreadsheets or bulk comparison:

python3 fileinfo.py --csv -r ./malware_samples/ -o summary.csv

You immediately get: file name/path, size, magic-based file type, entropy, permissions, and for PE/ELF/Mach-O — timestamp, compiler/language hints, imphash (PE), entry point, Rich header (PE), resources, overlay, digital signature, and packing heuristic.

Hashes and fuzzy hashing

Default hashes are MD5, SHA-1, and SHA-256 (single read pass). You can add SHA-384/SHA-512 and control which hashes are computed:

python3 fileinfo.py --hashes md5,sha1,sha256,sha512 /path/to/file

With ssdeep and py-tlsh installed, you also get ssdeep and tlsh hashes unless you pass --no-fuzzy. These are invaluable for clustering and “similar file” lookups (e.g. MalwareBazaar, VirusTotal).

Strings and YARA

Strings (ASCII and UTF-16 LE) with configurable minimum length:

python3 fileinfo.py --strings --min-str-len 8 sample.exe

Press enter or click to view image in full size

YARA (when yara-python and a rules file are available):

python3 fileinfo.py --yara /path/to/rules.yar sample.exe

Matches appear in the report so you can quickly see which rules fired.

Press enter or click to view image in full size

Full static analysis (--full): maximum metadata, no decompilation

The --full flag runs an extra layer of static analysis: no execution, no decompilation. It adds:

  • Byte-level stats: null ratio, printable ratio, byte frequency, longest null run.
  • Entropy map: per-block entropy so you can spot packed or encrypted regions.
  • Head/tail hex dump: first and last bytes for structure inspection.
  • String patterns: URLs, IPv4, emails, Windows/Unix paths, registry keys (from raw bytes, including UTF-16 LE).
  • PE deep: machine type, subsystem, DLL characteristics (ASLR, DEP, etc.), section table (name, size, entropy), full import/export lists, exphash, relocations, TLS callbacks, delay imports, Rich header, resource types, version info (FileVersion, CompanyName, etc.).
  • ELF deep: class, machine, sections/segments, dynamic (NEEDED, RPATH, RUNPATH), exported/imported symbols, notes.
  • Mach-O deep: CPU type, file type, dylibs, segments, UUID.
  • Containers: ZIP file listing (names, sizes); OLE stream listing (if olefile is installed).

Example:

python3 fileinfo.py --full sample.exe
python3 fileinfo.py --full --json sample.exe -o full_report.json

This is the mode you want when building a reproducible static report to compare with MalwareBazaar/VirusTotal or to feed into your own pipelines.

{
"file_name": "malware.exe",
"file_path": "/home/andrey/git_project/Basic-File-Information-Gathering-Script/malware_samples/malware.exe",
"file_size": 563311,
"file_size_human": "563311 bytes (0.54 MB)",
"magic_number": "4D5A9000",
"file_type": "Windows Executable (Extended MZ)",
"entropy": 7.2249,
"entropy_note": "Normal",
"permissions": "-rw-rw-r--",
"hashes": {
"md5": "0b375e6b7e44d7c8488c4227e9344197",
"sha1": "dd8753066efc055dea693f44627fd69c988dfc65",
"sha256": "9fdea40a9872a77335ae3b733a50f4d1e9f8eff193ae84e36fb7e5802c481f72"
},
"pe": {
"timestamp": "2019-10-28 09:44:53 UTC (OK)",
"compiler": "Unknown",
"imphash": "3313409012dcc6b8a34048226776435e",
"header_offset": "264 (0x108)",
"entry_point": "RVA 0xEAF2, VA 0x40EAF2",
"rich_header": "Present (parse error)",
"resources": "147 resource nodes",
"overlay": "137327 bytes (0x2186F)",
"signature": "Not signed",
"packing": "Unpacked"
},
"static_analysis": {
"byte_stats": {
"size_analyzed": 563311,
"null_ratio": 0.1027,
"printable_ratio": 0.4999,
"longest_null_run": 3424,
"top_byte_frequencies": [
"0xFF(16177)",
"0x8B(9498)",
"0x74(8902)",
"0x75(8338)",
"0x65(6918)",
"0x33(6268)",
"0x6A(6248)",
"0x6E(5795)",
"0x64(5711)",
"0x73(5700)"
]
},
"entropy_blocks": {
"block_size": 65536,
"num_blocks": 9,
"entropy_per_block": [
6.369,
6.62,
5.982,
7.53,
7.997,
7.993,
5.43,
5.044,
5.044
],
"high_entropy_blocks": [
{
"block": 3,
"offset": 196608,
"entropy": 7.53
},
{
"block": 4,
"offset": 262144,
"entropy": 7.997
},
{
"block": 5,
"offset": 327680,
"entropy": 7.993
}
],
"overall_avg_entropy": 6.445
},
"head_tail": {
"head_hex": "0000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 MZ..............\n0010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ........@.......\n0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................\n0030 00 00 00 00 00 00 00 00 00 00 00 00 08 01 00 00 ................\n0040 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 ........!..L.!Th\n0050 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F is program canno\n0060 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 t be run in DOS \n0070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 mode....$.......\n0080 46 CA 45 A4 02 AB 2B F7 02 AB 2B F7 02 AB 2B F7 F.E...+...+...+.\n0090 81 A3 74 F7 08 AB 2B F7 F8 88 32 F7 04 AB 2B F7 ..t...+...2...+.\n00A0 11 A3 76 F7 00 AB 2B F7 81 A3 76 F7 13 AB 2B F7 ..v...+...v...+.\n00B0 02 AB 2A F7 31 A9 2B F7 07 A7 24 F7 19 AB 2B F7 ..*.1.+...$...+.\n00C0 07 A7 74 F7 8A AB 2B F7 29 8A 0C F7 0B AB 2B F7 ..t...+.).....+.\n00D0 07 A7 4B F7 75 AB 2B F7 07 A7 77 F7 03 AB 2B F7 ..K.u.+...w...+.\n00E0 EE A0 75 F7 03 AB 2B F7 07 A7 71 F7 03 AB 2B F7 ..u...+...q...+.\n00F0 52 69 63 68 02 AB 2B F7 00 00 00 00 00 00 00 00 Rich..+.........",
"tail_hex": "0000 79 61 76 65 6A 79 73 76 62 75 68 79 7A 69 37 74 yavejysvbuhyzi7t\n0010 6B 68 63 78 68 6F 61 72 6F 6E 38 62 6A 7A 66 33 khcxhoaron8bjzf3\n0020 61 6E 6F 6F 38 69 34 78 61 71 78 73 70 35 63 78 anoo8i4xaqxsp5cx\n0030 64 6B 72 71 71 64 37 61 30 62 64 6B 68 6A 66 77 dkrqqd7a0bdkhjfw\n0040 62 66 6B 68 63 75 77 6D 32 76 32 62 71 65 35 37 bfkhcuwm2v2bqe57\n0050 6D 34 36 72 78 6B 66 65 6B 71 32 74 7A 63 6F 32 m46rxkfekq2tzco2\n0060 69 30 78 30 64 33 63 65 61 7A 30 38 70 64 66 63 i0x0d3ceaz08pdfc\n0070 34 66 65 32 6D 33 6E 69 7A 68 7A 66 70 73 34 27 4fe2m3nizhzfps4'"
},
"string_patterns": {
"urls": [],
"ipv4": [],
"emails": [],
"win_paths": [],
"unix_paths": [
"/atexit",
"/0123456789",
"/dd/yy",
"//rZy",
"/YAZj",
"/1VVg",
"/IKQr"
],
"registry": []
},
"pe_deep": {
"machine": "i386",
"number_of_sections": 4,
"timestamp": 1572255893,
"timestamp_utc": "2019-10-28 09:44:53+00:00",
"subsystem": "SUBSYSTEM.WINDOWS_GUI",
"dll_characteristics": 0,
"dll_characteristics_list": [],
"imagebase": "0x400000",
"entry_point_rva": "0xeaf2",
"section_alignment": 4096,
"file_alignment": 4096,
"size_of_image": 442368,
"checksum": 487412,
"data_directories_used": [],
"sections": [
{
"name": ".text",
"virtual_size": 161054,
"size": 163840,
"offset": 4096,
"entropy": 6.609,
"characteristics": "0x60000020"
},
{
"name": ".rdata",
"virtual_size": 43265,
"size": 45056,
"offset": 167936,
"entropy": 5.047,
"characteristics": "0x40000040"
},
{
"name": ".data",
"virtual_size": 201908,
"size": 188416,
"offset": 212992,
"entropy": 7.949,
"characteristics": "0xc0000040"
},
{
"name": ".rsrc",
"virtual_size": 23952,
"size": 24576,
"offset": 401408,
"entropy": 4.197,
"characteristics": "0x40000040"
}
],
"imports": [
{
"dll": "KERNEL32.dll",
"apis": [
"ExitProcess",
"TerminateProcess",
"HeapReAlloc",
"HeapSize",
"HeapDestroy",
"HeapCreate",
"VirtualFree",
"IsBadWritePtr",
"GetStdHandle",
"UnhandledExceptionFilter",
"FreeEnvironmentStringsA",
"GetEnvironmentStrings",
"FreeEnvironmentStringsW",
"GetEnvironmentStringsW",
"SetHandleCount",
"GetFileType",
"QueryPerformanceCounter",
"GetCommandLineA",
"GetSystemTimeAsFileTime",
"SetUnhandledExceptionFilter",
"LCMapStringA",
"LCMapStringW",
"GetStringTypeA",
"GetStringTypeW",
"GetTimeZoneInformation",
"IsBadReadPtr",
"IsBadCodePtr",
"SetStdHandle",
"SetEnvironmentVariableA",
"InterlockedExchange",
"GetStartupInfoA",
"VirtualQuery",
"GetSystemInfo",
"VirtualAlloc",
"VirtualProtect",
"HeapFree",
"HeapAlloc",
"RtlUnwind",
"GetFileTime",
"GetFileAttributesA",
"FileTimeToLocalFileTime",
"SetErrorMode",
"FileTimeToSystemTime",
"GetOEMCP",
"GetCPInfo",
"TlsFree",
"LocalReAlloc",
"TlsSetValue",
"TlsAlloc",
"TlsGetValue"
],
"api_count": 124
},
{
"dll": "USER32.dll",
"apis": [
"PostThreadMessageA",
"MessageBeep",
"GetNextDlgGroupItem",
"InvalidateRgn",
"CopyAcceleratorTableA",
"SetRect",
"IsRectEmpty",
"CharNextA",
"GetSysColorBrush",
"ReleaseCapture",
"LoadCursorA",
"SetCapture",
"wsprintfA",
"DestroyMenu",
"ShowWindow",
"MoveWindow",
"SetWindowTextA",
"IsDialogMessageA",
"SetDlgItemTextA",
"RegisterWindowMessageA",
"WinHelpA",
"GetCapture",
"CreateWindowExA",
"GetClassInfoExA",
"GetClassNameA",
"SetPropA",
"GetPropA",
"RemovePropA",
"SendDlgItemMessageA",
"SetFocus",
"IsChild",
"GetWindowTextA",
"GetForegroundWindow",
"GetTopWindow",
"UnhookWindowsHookEx",
"GetMessageTime",
"GetMessagePos",
"MapWindowPoints",
"SetForegroundWindow",
"UpdateWindow",
"GetMenu",
"AdjustWindowRectEx",
"EqualRect",
"GetClassInfoA",
"RegisterClassA",
"UnregisterClassA",
"GetDlgCtrlID",
"DefWindowProcA",
"CallWindowProcA",
"SetWindowLongA"
],
"api_count": 124
},
{
"dll": "GDI32.dll",
"apis": [
"CreateRectRgnIndirect",
"GetMapMode",
"GetBkColor",
"GetTextColor",
"GetRgnBox",
"CreatePen",
"GetDeviceCaps",
"GetStockObject",
"DeleteDC",
"ExtSelectClipRgn",
"ScaleWindowExtEx",
"SetWindowExtEx",
"ScaleViewportExtEx",
"SetViewportExtEx",
"OffsetViewportOrgEx",
"SetViewportOrgEx",
"SelectObject",
"Escape",
"CreatePatternBrush",
"TextOutA",
"RectVisible",
"PtVisible",
"GetWindowExtEx",
"GetViewportExtEx",
"GetObjectA",
"DeleteObject",
"MoveToEx",
"LineTo",
"GetClipBox",
"SetMapMode",
"SetTextColor",
"SetBkColor",
"RestoreDC",
"SaveDC",
"CreateBitmap",
"BitBlt",
"Ellipse",
"CreateCompatibleDC",
"CreateCompatibleBitmap",
"ExtTextOutA"
],
"api_count": 40
},
{
"dll": "comdlg32.dll",
"apis": [
"GetFileTitleA"
],
"api_count": 1
},
{
"dll": "WINSPOOL.DRV",
"apis": [
"OpenPrinterA",
"DocumentPropertiesA",
"ClosePrinter"
],
"api_count": 3
},
{
"dll": "ADVAPI32.dll",
"apis": [
"RegCloseKey",
"RegQueryValueExA",
"RegOpenKeyExA",
"RegDeleteKeyA",
"RegEnumKeyA",
"RegOpenKeyA",
"RegQueryValueA",
"RegCreateKeyExA",
"RegSetValueExA",
"SetFileSecurityW"
],
"api_count": 10
},
{
"dll": "SHELL32.dll",
"apis": [
"CommandLineToArgvW"
],
"api_count": 1
},
{
"dll": "COMCTL32.dll",
"apis": [
"ord_17"
],
"api_count": 1
},
{
"dll": "SHLWAPI.dll",
"apis": [
"PathFindFileNameA",
"PathStripToRootA",
"PathFindExtensionA",
"PathIsUNCA"
],
"api_count": 4
},
{
"dll": "oledlg.dll",
"apis": [
"ord_8"
],
"api_count": 1
},
{
"dll": "ole32.dll",
"apis": [
"CreateILockBytesOnHGlobal",
"StgCreateDocfileOnILockBytes",
"StgOpenStorageOnILockBytes",
"CoGetClassObject",
"CoTaskMemAlloc",
"CoTaskMemFree",
"CLSIDFromString",
"CLSIDFromProgID",
"OleUninitialize",
"CoFreeUnusedLibraries",
"CoRegisterMessageFilter",
"OleFlushClipboard",
"OleIsCurrentClipboard",
"CoRevokeClassObject",
"OleInitialize"
],
"api_count": 15
},
{
"dll": "OLEAUT32.dll",
"apis": [
"ord_6",
"ord_4",
"ord_9",
"ord_12",
"ord_8",
"ord_7",
"ord_150",
"ord_420",
"ord_184",
"ord_16",
"ord_2",
"ord_10"
],
"api_count": 12
}
],
"exports": [
"LayvXBcOppdgzCgnncA"
],
"export_count": 1,
"exphash": "f66bacc99dfdc9927b5678a2134e87c4",
"relocation_count": 0,
"relocation_blocks": [],
"tls_callbacks": [],
"delay_imports": [
"OLEACC.dll"
],
"rich_header_entries": [],
"resource_types": [],
"version_info": {}
}
}
}

Tuning format-specific analysis

If you only care about PE (e.g. Windows-only lab):

python3 fileinfo.py --no-elf --no-macho -r ./pe_samples/

Same idea for ELF-only or Mach-O-only environments.

Real malware: MalwareBazaar integration

The repo includes download_malware_sample.py to pull real Windows PE samples from MalwareBazaar (abuse.ch), run full static analysis, and save MalwareBazaar metadata for comparison.

  1. Get a free API key from abuse.ch Authentication.
  2. Install deps: pip install requests pyzipper
  3. Set your key: export ABUSE_CH_AUTH_KEY='your-key'

Then:

# Download one recent sample and run --full analysis
python3 download_malware_sample.py
# By known SHA256 (e.g. from a report)
python3 download_malware_sample.py 9FDEA40A9872A77335AE3B733A50F4D1E9F8EFF193AE84E36FB7E5802C481F72
# By tag (e.g. Emotet, TrickBot)
python3 download_malware_sample.py --tag Emotet --limit 1

Per sample, you get a directory under malware_samples/<sha256>/ with the binary, our_analysis.json (from fileinfo.py --full --json), and bazaar_info.json (MalwareBazaar metadata). You can diff hashes, imphash, file type, and PE/string findings against the feed and public reports.

Who is this for?

  • Malware analysts: Quick triage (hashes, type, entropy, packing, imphash) and deep static reports for comparison with threat intel.
  • Digital forensics: Consistent metadata (including timestamps and signatures) across many files; CSV/JSON for timelines and tooling.
  • SOC engineers: Scriptable file intelligence (JSON/CSV), optional YARA, and hashes that plug into VirusTotal/MalwareBazaar/EDR.

Summary

Basic File Information Gathering Script gives you:

  • One CLI (fileinfo.py) for batch file metadata and optional deep static analysis.
  • Single-pass multi-hash (MD5 through SHA-512) plus optional ssdeep/tlsh.
  • PE/ELF/Mach-O–aware fields: timestamps, imphash, entry point, packing, signatures, and with --full: sections, imports/exports, version info, entropy map, and string patterns (URLs, IPs, paths, registry).
  • Output as table, JSON, or CSV for automation and integration.
  • Optional YARA and a MalwareBazaar downloader script for real-sample workflow.

All of this is static only — no execution, no decompilation — so you can run it safely in automation and air-gapped labs. If you’re building or tightening a file-intel or malware-triage pipeline, this tool is worth a slot in your toolkit.


文章来源: https://infosecwriteups.com/one-tool-to-rule-them-all-file-metadata-static-analysis-for-malware-analysts-and-soc-teams-c6dba1f5b7de?source=rss----7b722bfd1b8d---4
如有侵权请联系:admin#unsafe.sh