DirectoryIsolate.load() validates names taken from a tar archive before
writing their contents into a staging directory. It rejects ordinary absolute
paths such as /outside.txt, Windows absolute paths such as
C:/outside.txt, backslashes and .. traversal.
On Windows it does not reject a different Windows path form:
This is not a normal filename on Windows. It is a drive-relative path: it
refers to payload.txt relative to the current directory remembered for drive
C:. In particular, it is not the same as either:
payload.txt # an ordinary relative member name
C:/payload.txt # an absolute path on drive C:
The current validation uses PurePosixPath and only rejects a first path
component exactly equal to C:. Therefore it rejects C:/payload.txt, whose
first component is C:, but it accepts C:payload.txt, whose first component
is the longer string C:payload.txt.
Why This Matters
During load, the implementation constructs a destination path approximately as
follows:
target_path = staging_directory / normalized_archive_member_name
On Windows, joining a staging directory with a drive-relative member has
special behavior:
from pathlib import PureWindowsPath
stage = PureWindowsPath(r'C:\Temp\throng-load-123')
assert stage / 'C:payload.txt' == PureWindowsPath(r'C:\Temp\throng-load-123\payload.txt')
assert stage / 'D:payload.txt' == PureWindowsPath(r'D:payload.txt')
There are two observable problems:
- If the member names the same drive as the staging directory, the name is
not preserved: an archive member named C:payload.txt is restored as
payload.txt.
- If the member names a different available drive, the joined destination is
no longer rooted in the staging directory. The write can occur relative to
that drive's current directory instead of inside the temporary staging tree.
The second case is the serious one: loading an archive must never write outside
the staging directory before commit.
OS Boundary
This problem is Windows-specific.
On POSIX systems, C:payload.txt is an ordinary legal filename containing a
colon. It has no drive semantics and remains below the staging directory:
from pathlib import PurePosixPath
stage = PurePosixPath('/tmp/throng-load-123')
assert stage / 'C:payload.txt' == PurePosixPath('/tmp/throng-load-123/C:payload.txt')
Consequently, a fix should not globally reject colon-containing archive member
names on Linux or macOS. That would change valid POSIX behavior and prevent a
POSIX archive containing a legal file named C:payload.txt from round-tripping
on POSIX.
When such a POSIX-created archive is loaded on Windows, rejecting the member is
appropriate: its meaning on the destination operating system is unsafe or
ambiguous. Archive load validation must follow the rules of the system where
files will actually be written.
Situations Where This Can Occur
This can occur when all of the following are true:
load() is executed on Windows.
- The loaded bytes are not guaranteed to have been produced by a trusted
current version of DirectoryIsolate.dump(), for example:
- an archive is received from another process, machine or user;
- an archive was generated with another tar implementation;
- an archive was created on POSIX and contains a colon in a filename;
- persisted archive data has been modified or corrupted.
- The archive includes a regular-file member whose name has Windows drive
syntax without a root separator, such as C:payload.txt or
D:nested.txt.
Writing outside staging additionally requires that the member identifies a
different usable drive from the staging directory. For example, if staging is
on C: and drive D: is available, D:payload.txt has the escaping behavior.
Situations Where This Cannot Occur
This issue does not arise:
- during
load() on Linux or macOS, because those systems do not interpret
drive-letter syntax;
- for archives containing only ordinary portable relative members such as
dir/file.txt;
- for Windows absolute members such as
C:/file.txt or C:\file.txt, because
the current validation already rejects them;
- when bytes are produced by the current
dump() from a Windows directory,
because Windows cannot create an ordinary file whose name contains :.
The final point does not make load() safe for external archives: load() is
already designed to validate hostile or malformed archive member names.
Minimal Reproduction Of The Validation Gap
The following is a non-destructive Windows reproduction. It uses the same
drive as the system temporary directory, so it demonstrates that the invalid
member is accepted and changed into another filename without intentionally
writing outside a temporary test tree.
from io import BytesIO
from pathlib import Path
from tarfile import TarInfo, open as open_tar
from tempfile import TemporaryDirectory, gettempdir
from throng.plugins.temporary_directory_throng import (
TemporaryDirectoryIsolationConfig,
TemporaryDirectoryThrong,
)
drive = Path(gettempdir()).drive
assert drive, 'Run this reproduction on Windows.'
member_name = f'{drive}payload.txt'
archive_buffer = BytesIO()
with open_tar(fileobj=archive_buffer, mode='w:') as archive:
payload = b'outside-safe-reproduction'
member = TarInfo(member_name)
member.size = len(payload)
archive.addfile(member, BytesIO(payload))
with TemporaryDirectory() as base_directory:
isolate = TemporaryDirectoryThrong(
config=TemporaryDirectoryIsolationConfig(
base_directory=base_directory,
compression='none',
),
).get_isolate()
isolate.load(archive_buffer.getvalue())
restored_names = [path.name for path in isolate.directory.iterdir()]
print(member_name) # for example: C:payload.txt
print(restored_names) # ['payload.txt']
assert restored_names == ['payload.txt']
assert (isolate.directory / 'payload.txt').read_bytes() == payload
Expected secure behavior is for load() to reject the archive before writing
any member. Current behavior accepts it and restores a differently named
file. The example deliberately checks iterdir() names: constructing
isolate.directory / member_name again would trigger Windows drive-relative
path interpretation again and would not inspect the literal archive name.
The outside-staging consequence can be demonstrated only in a controlled
Windows environment with two disposable accessible drives. An archive member
using the drive different from the staging directory must be used there; it
should not be used as a general CI reproduction because it deliberately tests
a write destination outside staging.
Minimal Regression Test
This test states the desired Windows-only contract without trying to create an
outside-staging write. It should fail against the current implementation and
pass after the validation is fixed.
from os import name as os_name
import pytest
from full_match import match
from tests.helpers import make_tar_bytes
from throng import ArchiveUnpackError
@pytest.mark.skipif(os_name != 'nt', reason='Windows drive-relative archive paths have no special meaning on POSIX')
def test_load_rejects_windows_drive_relative_member_path(temporary_isolate):
"""Verify that load rejects a Windows drive-relative path before it can leave staging."""
isolate = temporary_isolate(compression='none')
with pytest.raises(
ArchiveUnpackError,
match=match('Archive contains an unsafe Windows drive-relative path: C:payload.txt'),
):
isolate.load(make_tar_bytes({'C:payload.txt': b'unsafe'}))
The test is intentionally Windows-only. A POSIX test requiring rejection of
the same string would impose a Windows filename rule on systems where the
filename is harmless and valid.
Proposed Fix
Preferred Plan: Windows-Only Validation
- Keep the existing POSIX-oriented checks unchanged for every operating
system.
- When running on Windows, additionally parse the raw archive member name
using Windows path rules.
- Reject any member with a drive component, including both:
- absolute drive paths:
C:/payload.txt;
- drive-relative paths:
C:payload.txt.
- Add the Windows-only regression test above.
- Keep existing tests for absolute, traversal and backslash paths.
For example, the additional check could be based on PureWindowsPath:
from os import name as os_name
from pathlib import PureWindowsPath
if os_name == 'nt' and PureWindowsPath(raw_name).drive:
raise ArchiveUnpackError(f'Archive contains an unsafe Windows drive path: {raw_name}')
The exact error text can preserve the current distinction between absolute and
drive-relative paths if that is preferable for diagnostics. The significant
requirement is that the extra rejection is active only when extraction targets
Windows paths.
Alternative Plan: Reject Non-Portable Members Everywhere
The implementation could reject Windows drive syntax on every operating
system, making loaded archives portable across all supported platforms.
This is simpler to reason about, but it changes POSIX behavior: a legal POSIX
file named C:payload.txt would no longer be loadable. Because the bug arises
from Windows path interpretation, this alternative is not preferred unless the
project explicitly chooses a portable-subset archive format.
Acceptance Criteria
- On Windows,
load() rejects C:payload.txt and D:nested.txt before any
content is written.
- Existing rejection of
/outside.txt, C:/outside.txt, backslash paths and
traversal paths remains unchanged.
- On POSIX, loading a regular member literally named
C:payload.txt continues
to work unless the project deliberately adopts the portable-subset
alternative.
- The Windows-only regression test is included and coverage remains complete.
DirectoryIsolate.load()validates names taken from a tar archive beforewriting their contents into a staging directory. It rejects ordinary absolute
paths such as
/outside.txt, Windows absolute paths such asC:/outside.txt, backslashes and..traversal.On Windows it does not reject a different Windows path form:
This is not a normal filename on Windows. It is a drive-relative path: it
refers to
payload.txtrelative to the current directory remembered for driveC:. In particular, it is not the same as either:The current validation uses
PurePosixPathand only rejects a first pathcomponent exactly equal to
C:. Therefore it rejectsC:/payload.txt, whosefirst component is
C:, but it acceptsC:payload.txt, whose first componentis the longer string
C:payload.txt.Why This Matters
During load, the implementation constructs a destination path approximately as
follows:
On Windows, joining a staging directory with a drive-relative member has
special behavior:
There are two observable problems:
not preserved: an archive member named
C:payload.txtis restored aspayload.txt.no longer rooted in the staging directory. The write can occur relative to
that drive's current directory instead of inside the temporary staging tree.
The second case is the serious one: loading an archive must never write outside
the staging directory before commit.
OS Boundary
This problem is Windows-specific.
On POSIX systems,
C:payload.txtis an ordinary legal filename containing acolon. It has no drive semantics and remains below the staging directory:
Consequently, a fix should not globally reject colon-containing archive member
names on Linux or macOS. That would change valid POSIX behavior and prevent a
POSIX archive containing a legal file named
C:payload.txtfrom round-trippingon POSIX.
When such a POSIX-created archive is loaded on Windows, rejecting the member is
appropriate: its meaning on the destination operating system is unsafe or
ambiguous. Archive load validation must follow the rules of the system where
files will actually be written.
Situations Where This Can Occur
This can occur when all of the following are true:
load()is executed on Windows.current version of
DirectoryIsolate.dump(), for example:syntax without a root separator, such as
C:payload.txtorD:nested.txt.Writing outside staging additionally requires that the member identifies a
different usable drive from the staging directory. For example, if staging is
on
C:and driveD:is available,D:payload.txthas the escaping behavior.Situations Where This Cannot Occur
This issue does not arise:
load()on Linux or macOS, because those systems do not interpretdrive-letter syntax;
dir/file.txt;C:/file.txtorC:\file.txt, becausethe current validation already rejects them;
dump()from a Windows directory,because Windows cannot create an ordinary file whose name contains
:.The final point does not make
load()safe for external archives:load()isalready designed to validate hostile or malformed archive member names.
Minimal Reproduction Of The Validation Gap
The following is a non-destructive Windows reproduction. It uses the same
drive as the system temporary directory, so it demonstrates that the invalid
member is accepted and changed into another filename without intentionally
writing outside a temporary test tree.
Expected secure behavior is for
load()to reject the archive before writingany member. Current behavior accepts it and restores a differently named
file. The example deliberately checks
iterdir()names: constructingisolate.directory / member_nameagain would trigger Windows drive-relativepath interpretation again and would not inspect the literal archive name.
The outside-staging consequence can be demonstrated only in a controlled
Windows environment with two disposable accessible drives. An archive member
using the drive different from the staging directory must be used there; it
should not be used as a general CI reproduction because it deliberately tests
a write destination outside staging.
Minimal Regression Test
This test states the desired Windows-only contract without trying to create an
outside-staging write. It should fail against the current implementation and
pass after the validation is fixed.
The test is intentionally Windows-only. A POSIX test requiring rejection of
the same string would impose a Windows filename rule on systems where the
filename is harmless and valid.
Proposed Fix
Preferred Plan: Windows-Only Validation
system.
using Windows path rules.
C:/payload.txt;C:payload.txt.For example, the additional check could be based on
PureWindowsPath:The exact error text can preserve the current distinction between absolute and
drive-relative paths if that is preferable for diagnostics. The significant
requirement is that the extra rejection is active only when extraction targets
Windows paths.
Alternative Plan: Reject Non-Portable Members Everywhere
The implementation could reject Windows drive syntax on every operating
system, making loaded archives portable across all supported platforms.
This is simpler to reason about, but it changes POSIX behavior: a legal POSIX
file named
C:payload.txtwould no longer be loadable. Because the bug arisesfrom Windows path interpretation, this alternative is not preferred unless the
project explicitly chooses a portable-subset archive format.
Acceptance Criteria
load()rejectsC:payload.txtandD:nested.txtbefore anycontent is written.
/outside.txt,C:/outside.txt, backslash paths andtraversal paths remains unchanged.
C:payload.txtcontinuesto work unless the project deliberately adopts the portable-subset
alternative.