fix(docker): bundle Python runtime for portable /agent-server#2676
fix(docker): bundle Python runtime for portable /agent-server#2676simonrosenberg wants to merge 4 commits intomainfrom
Conversation
After building the venv with system Python, copy the interpreter binary, standard library, and libpython shared objects into /agent-server/.python/. Re-point the venv symlinks and pyvenv.cfg at the bundled copy so that the entire /agent-server directory is self-contained. This means eval images (and any other consumer) can COPY /agent-server onto any base image — even one without Python — and the entrypoint will resolve. Changes: - Builder stage: new RUN step bundles Python runtime into .python/ - source / source-minimal targets: set LD_LIBRARY_PATH for libpython - Add Dockerfile.portability-test for CI validation of the contract - Add unit tests verifying Dockerfile portability structure Fixes #2585 Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Solves a real production problem with a pragmatic approach. One critical issue with the portability test Dockerfile needs fixing.
Key Insight: This is good infrastructure work that makes /agent-server truly portable. The bundled Python approach is sound, but the portability validation Dockerfile is incomplete.
| ARG USERNAME | ||
| COPY --chown=${USERNAME}:${USERNAME} --from=builder /agent-server /agent-server | ||
| # Bundled Python's libpython*.so lives under /agent-server/.python/lib | ||
| ENV LD_LIBRARY_PATH=/agent-server/.python/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH} |
There was a problem hiding this comment.
🟢 Acceptable: The ${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH} syntax correctly avoids a trailing colon when the variable is empty. Prepending ensures the bundled libpython takes precedence.
Potential consideration: If a runtime image has conflicting Python libraries in its LD_LIBRARY_PATH, prepending should resolve it. If issues arise, may need LD_LIBRARY_PATH=/agent-server/.python/lib (no append) to force isolation.
|
Addressed the review feedback. Changes pushed on
Validation:
|
|
[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
…+ execstack sanitize
Switches the agent-server builder stage from `uv venv --python-preference
only-system` back to uv-managed python-build-standalone, installed into
`/agent-server/uv-managed-python`. This restores the pre-v1.15.0 property
that `/agent-server/.venv/bin/python` is a symlink inside `/agent-server`,
so downstream consumers can COPY `/agent-server` onto any base image
without needing a matching system Python.
The original reason `only-system` was chosen was that
python-build-standalone's `libpython3.13.so.1.0` ships with
`PT_GNU_STACK PF_X`, which Debian Trixie's glibc NX enforcement and
sysbox-runc seccomp refuse to load. This commit addresses that at its
actual layer — ELF program headers — rather than by dodging managed
Python.
New helper `clear_execstack.py`:
- Walks a directory tree, finds every `.so*` file, parses ELF program
headers, and clears PF_X on any PT_GNU_STACK entry that has it.
- Supports ELF32/ELF64 and both endiannesses.
- Idempotent; no-op on already-clean ELFs and non-ELF files.
- Strip-safe: only rewrites a single uint32 inside an existing phdr.
- Dual-use: runnable as `python clear_execstack.py <path>`, importable
as `clear_execstack(path)` / `clear_execstack_in_tree(root)`.
Two call sites share the helper:
1. Builder stage runs it across `/agent-server/uv-managed-python`
immediately after `uv python install 3.13`, before `uv venv`.
2. PyInstaller spec loads it via importlib and applies it as a
post-Analysis hook so the `binary`/`binary-minimal` one-file
archive also ships sanitized .so files. Supersedes the inline
version from #2574.
Builder also asserts `.venv/bin/python` resolves inside
`/agent-server/uv-managed-python/` so a future regression fails at
build time instead of at downstream runtime.
Tests (30 cases) cover the full ELF matrix: 32/64-bit × LE/BE,
PT_GNU_STACK RWX / RW / absent, tree walk, symlink skip, non-ELF skip,
truncated ELF, idempotence, and the CLI entrypoint.
Closes #2761. Supersedes #2676 (bundle Debian Python) and #2692
(naive revert to --managed-python without execstack fix).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
1 similar comment
|
[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
This was fixed by bumping version |
Summary
/agent-server/.pythonduring the builder stageLD_LIBRARY_PATHfor source-based runtime imagesRoot cause
SDK v1.15.0 (commit
06b91863, Mar 26) switched the builder from--managed-python(uv-installed, portable) to--python-preference only-systemto fix a legitimate seccomp issue (python-build-standalone's libpython has an executable stack flag rejected under DinD restrictions).This made the venv non-portable:
.venv/bin/pythonbecame a symlink to/usr/local/bin/python3(from the builder'spython:3.13-bookworm). When this venv is COPYed onto commit0 base images (Ubuntu 22.04, Python at/usr/bin/python3), the symlink is broken and the container fails to start:SWE-bench was unaffected because its base images derive from Python Docker images that have
/usr/local/bin/python3.Timeline:
--managed-python(portable) → worked everywhere06b91863):--python-preference only-system→ broke commit0Fix
Keep
--python-preference only-system(no seccomp issues) but bundle the runtime:/agent-server/.python/pyvenv.cfgat the bundled copyLD_LIBRARY_PATHin source targets for libpython resolutionValidation
uv run pytest tests/agent_server/test_docker_build.py -qdocker buildx build --platform linux/amd64 --target source-minimal --build-arg BASE_IMAGE=docker.io/wentingzhao/wcwidth:v0 -f openhands-agent-server/openhands/agent_server/docker/Dockerfile -t local/commit0-wcwidth-portable:pr --load .docker run --rm --platform linux/amd64 --entrypoint /bin/sh local/commit0-wcwidth-portable:pr -lc '/agent-server/.venv/bin/python -c "import openhands.agent_server, sys; print(sys.executable)"'.python/directory (not/usr/local/bin)/usr/local/bin/python3— would have failed without fixRefs: OpenHands/benchmarks#607
Fixes #2585
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:ab95540-pythonRun
All tags pushed for this build
About Multi-Architecture Support
ab95540-python) is a multi-arch manifest supporting both amd64 and arm64ab95540-python-amd64) are also available if neededPartially addresses #2687 (source-image portability design issue) by making the runtime self-contained.