Skip to content

chore: bump software-agent-sdk for portable agent-server runtime#612

Closed
simonrosenberg wants to merge 2 commits intomainfrom
chore/bump-software-agent-sdk-portable-python
Closed

chore: bump software-agent-sdk for portable agent-server runtime#612
simonrosenberg wants to merge 2 commits intomainfrom
chore/bump-software-agent-sdk-portable-python

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented Apr 2, 2026

Summary

  • Bump vendor/software-agent-sdk to include the Docker portability fix for commit0 source images
  • Pull in the Python runtime bundling that makes /agent-server self-contained across base images

Problem

Since SDK v1.15.0 (Mar 26), the agent-server Dockerfile builds a venv with symlinks to /usr/local/bin/python3 (from the builder's python:3.13-bookworm). Commit0 base images are Ubuntu 22.04 with Python at /usr/bin/python3, so the container fails to start:

exec: "/agent-server/.venv/bin/python": stat /usr/local/bin/python3: no such file or directory

This causes all commit0 runtime pods to get stuck in pending — kubelet can't start the container.

Fix

The SDK fix (OpenHands/software-agent-sdk#2676) bundles the Python runtime into /agent-server/.python/ and repoints venv symlinks, making the image portable across any base image.

Validation

  • Verified in the SDK PR with uv run pytest tests/agent_server/test_docker_build.py -q
  • Verified in the SDK PR with a local source-minimal build against docker.io/wentingzhao/wcwidth:v0 and a runtime import smoke test
  • Also independently verified with docker.io/wentingzhao/tinydb:v0 — container starts, Python resolves correctly

Depends on: OpenHands/software-agent-sdk#2676
Fixes #607

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable approach, but premature

[CRITICAL ISSUES]

🔴 Dependency on unmerged code: You're bumping to commit b32e1c74 from SDK PR #2676, which is still open and unmerged. This violates basic stability principles. What happens when that PR changes? What if it never merges? Don't point production infrastructure at moving targets.

🟠 Missing evidence in THIS repo: Your validation section says "verified in the SDK PR" - that's not good enough. You're changing a submodule in the benchmarks repo, so show me that the benchmarks still work. Run terminalbench-infer, swebench-infer, or whatever uses this agent-server code and prove it doesn't break. Upstream tests don't count as proof your integration works.

[IMPROVEMENT OPPORTUNITIES]

🟡 Evidence Section: Add concrete proof that commit0 (or another affected benchmark) now works end-to-end with this change. A command + output showing a successful run would make this reviewable.

VERDICT:
⏸️ Wait for upstream, then validate locally

KEY INSIGHT:
Solving a real problem (good), but pointing to unmerged upstream code breaks the "reproducible builds" principle this repo values. Merge the SDK PR first, then bump to a stable commit.

@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

Added repo-local validation and updated the submodule bump to the latest SDK branch head (c4227cd4).

Benchmarks-local validation run in this repo:

  • FORCE_BUILD=1 uv run benchmarks/commit0/build_images.py --repo-split wcwidth --n-limit 1 --max-workers 1 --build-batch-size 1 --target source-minimal --image local/commit0-portable-evidence -> Built=1 Skipped=0 Failed=0
  • docker run --rm --platform linux/amd64 --entrypoint /bin/sh local/commit0-portable-evidence:c4227cd-commit0-wcwidth-source-minimal -lc '/agent-server/.venv/bin/python -c "import openhands.agent_server, sys; print(sys.executable)"' -> /agent-server/.venv/bin/python
  • docker run --rm --platform linux/amd64 --entrypoint /bin/sh local/commit0-portable-evidence:c4227cd-commit0-wcwidth-source-minimal -lc 'readelf -h /agent-server/.python/bin/python3.13 | sed -n "1,12p"; test ! -e /usr/local/bin/python3 && echo no_usr_local_python3' -> bundled interpreter is x86-64; /usr/local/bin/python3 is absent in the base image

The remaining process blocker from the review still stands: this PR should be repointed to the merged main SHA once OpenHands/software-agent-sdk#2676 lands. The repo-local integration evidence is now in place.

@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

We fixed the core issue with #614

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

commit0: runtime pods stuck in pending, all instances fail

2 participants