Skip to content

workspace hydrate accepts symlink targets outside the archive root #3093

@Aphroq

Description

@Aphroq

Describe the bug

validate_tarfile() currently validates tar member paths, but it does not validate a symlink member's linkname. As a result, workspace hydrate paths that allow symlinks will accept entries whose targets point outside the extracted archive root.

That shows up most clearly in workspace restore flows:

  • UnixLocalSandboxSession.hydrate_workspace() calls safe_extract_tarfile()
  • DockerSandboxSession.hydrate_workspace() validates the tar and then streams it into tar -x
  • several extension backends validate tar bytes the same way before hydrate

The regular session.extract() tar path is already using allow_symlinks=False, so this is not "all tar uploads are unsafe." The problem is narrower: restore / hydrate flows that intentionally allow symlink members also allow targets like /etc/passwd or ../../outside.

I can reproduce this on current main by creating a tar with leak -> /etc/passwd, passing it through validate_tarfile(), and then extracting it. Validation succeeds and the restored workspace ends up with a symlink that points outside the extracted root.

Debug information

  • Agents SDK version: main at f2fb9ffb (latest release boundary: v0.15.1)
  • Python version: Python 3.12

Repro steps

import io
import tarfile
import tempfile
from pathlib import Path

from agents.sandbox.util.tar_utils import safe_extract_tarfile, validate_tarfile

buf = io.BytesIO()
with tarfile.open(fileobj=buf, mode="w") as tf:
    info = tarfile.TarInfo("leak")
    info.type = tarfile.SYMTYPE
    info.linkname = "/etc/passwd"
    tf.addfile(info)

buf.seek(0)
with tarfile.open(fileobj=buf, mode="r:*") as tf:
    validate_tarfile(tf)
    with tempfile.TemporaryDirectory() as td:
        safe_extract_tarfile(tf, root=Path(td))
        print((Path(td) / "leak").readlink())

Current result:

/etc/passwd

Expected behavior

Workspace hydrate / restore validation should reject symlink targets that escape the extracted archive root.

Keeping the general symlink support is still useful for normal workspace snapshots, but restore paths that materialize an archive into a sandbox workspace should fail loudly when a symlink target is absolute or traverses outside the archive root.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions