You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The openhands-agent-server Dockerfile builder stage has used uv venv --python-preference only-system since v1.15.0 (#2567, commits 29b76b66 and 3192b084). Under this mode, /agent-server/.venv/bin/python is a symlink to the builder image's /usr/local/bin/python3 — the .venv is not self-contained, it shells around the base image's system CPython.
Two concrete problems follow from that:
1. Non-portable .venv. When a downstream consumer COPYs /agent-server from the SDK builder onto a different base image, the symlink points at a path that may not exist in the target. Verified breakage: commit0's raw Ubuntu bases (e.g. docker.io/wentingzhao/tinydb:v0) have no /usr/local/bin/python3, so runtime pods fail with:
exec: "/agent-server/.venv/bin/python": stat /usr/local/bin/python3: no such file or directory
Downstream has worked around this in OpenHands/benchmarks#614 by introducing benchmarks/utils/Dockerfile.agent-layer-commit0 (and hardening benchmarks/utils/Dockerfile.agent-layer for SWE-bench) — both manually COPYing Debian's Python 3.13 runtime from the builder stage into the final image. This works but is dead-code-in-waiting: it duplicates SDK internals into a downstream repo and is load-bearing on undocumented SDK Dockerfile layout.
2. Degraded local/dev experience. The .venv inside any SDK-built image no longer contains its own Python. Developers inspecting a running container see a venv that depends on base-image state rather than being a self-contained environment. The "python contained in the .venv" property that existed pre-v1.15.0 is gone.
Why only-system was chosen in the first place
python-build-standalone (what uv python install downloads) ships libpython3.13.so.1.0 with PT_GNU_STACK PF_X. Under Debian's glibc 2.41-12+deb13u2 (Trixie) NX enforcement and under Docker-in-Docker with seccomp restrictions (GitHub Actions, sysbox-runc), the dynamic linker rejects these .sos with:
cannot enable executable stack as shared object requires: Invalid argument
All binary-target GAIA evals were failing 100% with PYI-37 errors before v1.15.0. The --python-preference only-system switch dodged this by using Debian's CPython (which doesn't have the PF_X flag set), at the cost of the self-contained .venv property.
Address the execstack problem at its actual layer (ELF program headers) instead of dodging python-build-standalone, and restore managed-Python so .venv is self-contained by construction.
Walks a directory tree, finds every .so* file, parses ELF program headers, and clears the PF_X bit on any PT_GNU_STACK entry that has it. Idempotent. No-op on ELFs that don't have PT_GNU_STACK or don't have PF_X set. Supports 32/64-bit, amd64/arm64, stripped and unstripped binaries.
Apply the same helper as a post-Analysis hook so bundled .sos in the binary/binary-minimal target also get sanitized. PR #2574 proved this approach end-to-end: GAIA eval_limit=1 on sysbox-runc with image 28a56ab-gaia-binary — 0 PYI-37 errors, 1/1 instance resolved, versus 4 PYI-37 errors in the control (image 4907d99-gaia-binary) in the same window.
Net result:
.venv/bin/python → ../uv-managed-python/cpython-3.13-.../bin/python3.13 — fully inside /agent-server, no base-image dependency
Bundled Python and PyInstaller binaries are both PT_GNU_STACK RW — DinD/sysbox/seccomp safe
Same helper, two call sites — no duplicated logic
/agent-server becomes genuinely portable as a side effect (python-build-standalone is designed for relocation; Debian CPython is not)
Why this is better than the alternatives considered
fix(docker): bundle Python runtime for portable /agent-server #2676 (bundle Debian Python into /agent-server/.python/). Tactical patch. Works on glibc-based downstream bases but is not a universal artifact boundary — musl/Alpine, mismatched glibc ABI, and arch mismatch all still break. Per Python's own docs, venvs are not generally portable when the interpreter is relocated. Adds ~60 lines of Dockerfile shell to rewrite venv symlinks, patch pyvenv.cfg, and bundle libraries. Does not satisfy the "python contained in the .venv" property — the venv still shells around a CPython that was never designed to be relocated. To be closed in favor of this plan.
Wheel migration (pip install openhands-agent-server). Long-term direction but not shippable today. Blocked on two packaging gaps: (a) openhands-agent-server's pyproject.toml does not declare openhands-tools as a runtime dep despite importing openhands.tools.*, so pip install openhands-agent-server==1.16.1 && python -m openhands.agent_server crashes with ModuleNotFoundError: No module named 'openhands.tools'; and (b) there is no supported multi-package same-SHA install pattern for consumers who pin unreleased SDK commits via a vendor submodule (a git+…#subdirectory=openhands-agent-server install resolves siblings from PyPI, not from the same commit). Captured as separate tracking work.
The managed-Python + ELF-sanitize approach is a strictly smaller and more principled diff than any of the above, and is the only option that simultaneously satisfies: self-contained .venv, DinD/sysbox-safe, PyInstaller-safe, no downstream Dockerfile coupling, no packaging prerequisites.
Validation plan
The execstack problem is silent at build time and only manifests at runtime, so validation has to include a real downstream eval run, not just CI import checks.
1. Unit tests for clear_execstack.py
Fixture-driven: ELFs with PT_GNU_STACK RWX, PT_GNU_STACK RW, no PT_GNU_STACK, 32/64-bit, amd64/arm64, stripped and unstripped. Idempotence (running the script twice produces no change on the second pass). Invariant: after the helper walks a directory, readelf -l on every .so* shows GNU_STACK without the E flag.
2. Small benchmark eval runs on the feature branches
Open the SDK PR as draft. SDK CI will build and push multi-arch agent-server images tagged with the PR SHA (ghcr.io/openhands/agent-server:<sha>-python). On a matching benchmarks feature branch, bump vendor/software-agent-sdk to the SDK PR head SHA and run all three downstream targets at eval_limit=5:
GAIA — binary target
commit0 — source-minimal target, raw Ubuntu base
SWE-bench — source-minimal target
All three must complete without PYI-37 errors or cannot enable executable stack failures, and instance resolution on the 5-instance sample should match historical baselines. Only then mark the SDK PR ready for review.
3. Rollout
Merge SDK PR, cut release.
Release notes: "agent-server builder switched from --python-preference only-system to --managed-python with PF_X sanitization on bundled Python and PyInstaller binaries. Restores self-contained .venv contract."
Bump vendor/software-agent-sdk in benchmarks to the released SDK tag and rebuild per-benchmark images.
4. Downstream cleanup (separate benchmarks PR, after rollout is stable)
Strip the now-dead Python-runtime COPY workarounds from benchmarks/utils/Dockerfile.agent-layer and benchmarks/utils/Dockerfile.agent-layer-commit0:
Drop COPY --from=builder /usr/local/bin/python3.13, /usr/local/lib/python3.13, libpython3.13.so*, /usr/local/bin/python3
Drop ENV LD_LIBRARY_PATH=/usr/local/lib
Drop ENV UV_PYTHON_INSTALL_DIR=/agent-server/uv-managed-python — the SDK builder stage now sets this itself
Keep everything else: user creation, system packages (for commit0's raw upstream bases), uv binary COPY, cache dir ENV vars. Those are commit0's legitimate concerns, not workarounds for an SDK contract.
Scope
In scope for this issue:
Add clear_execstack.py helper with unit tests
Switch builder stage to --managed-python with UV_PYTHON_INSTALL_DIR=/agent-server/uv-managed-python
Apply helper to $UV_PYTHON_INSTALL_DIR in the builder stage, after uv python install 3.13
Apply helper to PyInstaller bundled .sos via the spec file's Analysis hook
Matching downstream benchmarks cleanup PR after rollout
Out of scope (tracked separately):
Wheel-install migration for downstream consumers. Depends on openhands-agent-server declaring openhands-tools as a runtime dep and a supported multi-package same-SHA install pattern.
Image size.binary and binary-minimal targets grow ~80MB (python-build-standalone is larger than a Debian CPython install). Acceptable — smaller than the Chromium/VSCode layers that already dominate the binary target.
The helper is a workaround for an upstream bug. Ideally python-build-standalone would build without PF_X set. The helper becomes a no-op when that lands upstream and can be removed.
Problem
The
openhands-agent-serverDockerfile builder stage has useduv venv --python-preference only-systemsince v1.15.0 (#2567, commits29b76b66and3192b084). Under this mode,/agent-server/.venv/bin/pythonis a symlink to the builder image's/usr/local/bin/python3— the.venvis not self-contained, it shells around the base image's system CPython.Two concrete problems follow from that:
1. Non-portable
.venv. When a downstream consumerCOPYs/agent-serverfrom the SDK builder onto a different base image, the symlink points at a path that may not exist in the target. Verified breakage: commit0's raw Ubuntu bases (e.g.docker.io/wentingzhao/tinydb:v0) have no/usr/local/bin/python3, so runtime pods fail with:Downstream has worked around this in
OpenHands/benchmarks#614by introducingbenchmarks/utils/Dockerfile.agent-layer-commit0(and hardeningbenchmarks/utils/Dockerfile.agent-layerfor SWE-bench) — both manuallyCOPYing Debian's Python 3.13 runtime from the builder stage into the final image. This works but is dead-code-in-waiting: it duplicates SDK internals into a downstream repo and is load-bearing on undocumented SDK Dockerfile layout.2. Degraded local/dev experience. The
.venvinside any SDK-built image no longer contains its own Python. Developers inspecting a running container see a venv that depends on base-image state rather than being a self-contained environment. The "python contained in the.venv" property that existed pre-v1.15.0 is gone.Why
only-systemwas chosen in the first placepython-build-standalone(whatuv python installdownloads) shipslibpython3.13.so.1.0withPT_GNU_STACK PF_X. Under Debian's glibc2.41-12+deb13u2(Trixie) NX enforcement and under Docker-in-Docker with seccomp restrictions (GitHub Actions, sysbox-runc), the dynamic linker rejects these.sos with:All
binary-target GAIA evals were failing 100% withPYI-37errors before v1.15.0. The--python-preference only-systemswitch dodged this by using Debian's CPython (which doesn't have thePF_Xflag set), at the cost of the self-contained.venvproperty.Proposed fix: managed-Python +
PT_GNU_STACK PF_XsanitizationAddress the execstack problem at its actual layer (ELF program headers) instead of dodging
python-build-standalone, and restore managed-Python so.venvis self-contained by construction.Builder stage change:
Helper:
clear_execstack.pyWalks a directory tree, finds every
.so*file, parses ELF program headers, and clears thePF_Xbit on anyPT_GNU_STACKentry that has it. Idempotent. No-op on ELFs that don't havePT_GNU_STACKor don't havePF_Xset. Supports 32/64-bit, amd64/arm64, stripped and unstripped binaries.PyInstaller spec change (resurrects #2574):
Apply the same helper as a post-Analysis hook so bundled
.sos in thebinary/binary-minimaltarget also get sanitized. PR #2574 proved this approach end-to-end: GAIAeval_limit=1on sysbox-runc with image28a56ab-gaia-binary— 0PYI-37errors, 1/1 instance resolved, versus 4PYI-37errors in the control (image4907d99-gaia-binary) in the same window.Net result:
.venv/bin/python→../uv-managed-python/cpython-3.13-.../bin/python3.13— fully inside/agent-server, no base-image dependencyPT_GNU_STACK RW— DinD/sysbox/seccomp safe/agent-serverbecomes genuinely portable as a side effect (python-build-standalone is designed for relocation; Debian CPython is not)Why this is better than the alternatives considered
--managed-python). Closed. ReintroducesPYI-37execstack crashes in DinD/sysbox — the exact problemonly-systemwas added to fix. Doesn't address the execstack issue at all./agent-server/.python/). Tactical patch. Works on glibc-based downstream bases but is not a universal artifact boundary — musl/Alpine, mismatched glibc ABI, and arch mismatch all still break. Per Python's own docs, venvs are not generally portable when the interpreter is relocated. Adds ~60 lines of Dockerfile shell to rewrite venv symlinks, patchpyvenv.cfg, and bundle libraries. Does not satisfy the "python contained in the.venv" property — the venv still shells around a CPython that was never designed to be relocated. To be closed in favor of this plan.pip install openhands-agent-server). Long-term direction but not shippable today. Blocked on two packaging gaps: (a)openhands-agent-server'spyproject.tomldoes not declareopenhands-toolsas a runtime dep despite importingopenhands.tools.*, sopip install openhands-agent-server==1.16.1 && python -m openhands.agent_servercrashes withModuleNotFoundError: No module named 'openhands.tools'; and (b) there is no supported multi-package same-SHA install pattern for consumers who pin unreleased SDK commits via a vendor submodule (agit+…#subdirectory=openhands-agent-serverinstall resolves siblings from PyPI, not from the same commit). Captured as separate tracking work.The managed-Python + ELF-sanitize approach is a strictly smaller and more principled diff than any of the above, and is the only option that simultaneously satisfies: self-contained
.venv, DinD/sysbox-safe, PyInstaller-safe, no downstream Dockerfile coupling, no packaging prerequisites.Validation plan
The execstack problem is silent at build time and only manifests at runtime, so validation has to include a real downstream eval run, not just CI import checks.
1. Unit tests for
clear_execstack.pyFixture-driven: ELFs with
PT_GNU_STACK RWX,PT_GNU_STACK RW, noPT_GNU_STACK, 32/64-bit, amd64/arm64, stripped and unstripped. Idempotence (running the script twice produces no change on the second pass). Invariant: after the helper walks a directory,readelf -lon every.so*showsGNU_STACKwithout theEflag.2. Small benchmark eval runs on the feature branches
Open the SDK PR as draft. SDK CI will build and push multi-arch agent-server images tagged with the PR SHA (
ghcr.io/openhands/agent-server:<sha>-python). On a matchingbenchmarksfeature branch, bumpvendor/software-agent-sdkto the SDK PR head SHA and run all three downstream targets ateval_limit=5:binarytargetsource-minimaltarget, raw Ubuntu basesource-minimaltargetAll three must complete without
PYI-37errors orcannot enable executable stackfailures, and instance resolution on the 5-instance sample should match historical baselines. Only then mark the SDK PR ready for review.3. Rollout
--python-preference only-systemto--managed-pythonwithPF_Xsanitization on bundled Python and PyInstaller binaries. Restores self-contained.venvcontract."vendor/software-agent-sdkin benchmarks to the released SDK tag and rebuild per-benchmark images.4. Downstream cleanup (separate
benchmarksPR, after rollout is stable)Strip the now-dead Python-runtime COPY workarounds from
benchmarks/utils/Dockerfile.agent-layerandbenchmarks/utils/Dockerfile.agent-layer-commit0:COPY --from=builder /usr/local/bin/python3.13,/usr/local/lib/python3.13,libpython3.13.so*,/usr/local/bin/python3ENV LD_LIBRARY_PATH=/usr/local/libENV UV_PYTHON_INSTALL_DIR=/agent-server/uv-managed-python— the SDK builder stage now sets this itselfKeep everything else: user creation, system packages (for commit0's raw upstream bases),
uvbinary COPY, cache dir ENV vars. Those are commit0's legitimate concerns, not workarounds for an SDK contract.Scope
In scope for this issue:
clear_execstack.pyhelper with unit tests--managed-pythonwithUV_PYTHON_INSTALL_DIR=/agent-server/uv-managed-python$UV_PYTHON_INSTALL_DIRin the builder stage, afteruv python install 3.13.sos via the spec file's Analysis hookOut of scope (tracked separately):
openhands-agent-serverdeclaringopenhands-toolsas a runtime dep and a supported multi-package same-SHA install pattern.openhands-agent-servermissingopenhands-toolsruntime dep. Independent bug, filed separately.Known downsides
binaryandbinary-minimaltargets grow ~80MB (python-build-standalone is larger than a Debian CPython install). Acceptable — smaller than the Chromium/VSCode layers that already dominate thebinarytarget.PF_Xset. The helper becomes a no-op when that lands upstream and can be removed.References
06b91863)--python-preference only-system:29b76b66(Mar 25, initial),3192b084(Mar 26, reapplied)