Skip to content

Enable ACPAgent on RemoteRuntime API#2190

Merged
simonrosenberg merged 40 commits intomainfrom
feat/acp-remote-runtime
Mar 13, 2026
Merged

Enable ACPAgent on RemoteRuntime API#2190
simonrosenberg merged 40 commits intomainfrom
feat/acp-remote-runtime

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented Feb 23, 2026

Summary

  • Polymorphic agent deserialization in EventService so ACPAgent payloads work
  • Eager ACPAgent import in api.py to register in discriminated union
  • Agent lifecycle cleanup (close() on AgentBase, called from LocalConversation)
  • Pre-install claude-code-acp and codex-acp in Docker image
  • Add remote runtime example for ACPAgent

Test plan

  • Unit tests pass (72/72): test_acp_agent.py + test_agent_loading.py
  • CI image build (triggered by this PR)
  • SWE-bench validation with ACPAgent on 5 instances

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:3dcb3dd-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-3dcb3dd-python \
  ghcr.io/openhands/agent-server:3dcb3dd-python

All tags pushed for this build

ghcr.io/openhands/agent-server:3dcb3dd-golang-amd64
ghcr.io/openhands/agent-server:3dcb3dd-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:3dcb3dd-golang-arm64
ghcr.io/openhands/agent-server:3dcb3dd-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:3dcb3dd-java-amd64
ghcr.io/openhands/agent-server:3dcb3dd-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:3dcb3dd-java-arm64
ghcr.io/openhands/agent-server:3dcb3dd-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:3dcb3dd-python-amd64
ghcr.io/openhands/agent-server:3dcb3dd-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:3dcb3dd-python-arm64
ghcr.io/openhands/agent-server:3dcb3dd-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:3dcb3dd-golang
ghcr.io/openhands/agent-server:3dcb3dd-java
ghcr.io/openhands/agent-server:3dcb3dd-python

About Multi-Architecture Support

  • Each variant tag (e.g., 3dcb3dd-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 3dcb3dd-python-amd64) are also available if needed

- Polymorphic agent deserialization in EventService (type(self.stored.agent)
  instead of hardcoded Agent)
- Eager import of ACPAgent in api.py to register in discriminated union
- Add close() lifecycle method to AgentBase, call from LocalConversation.close()
- Await conversation close in EventService to ensure subprocess cleanup
- Pre-install claude-code-acp in Docker image for ACPAgent support
- Add remote runtime example for ACPAgent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 23, 2026

API breakage checks (Griffe)

Result: Passed

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Elegant solution that eliminates special cases

KEY INSIGHT: The polymorphic deserialization (agent_cls = type(self.stored.agent)) is textbook good design - it preserves type information naturally without conditionals, making ACPAgent work seamlessly alongside the base Agent class.

What I like:

  • Data structure wins: Using Python's type system to handle polymorphism instead of switch statements
  • Proper lifecycle management: The close() hook is clean extensibility (no-op by default, override where needed)
  • Correctness fix: The await on run_in_executor (line 609) - you actually need to wait for the cleanup
  • Solves real problem: ACPAgent subprocess cleanup is necessary, not theoretical

Minor notes:

  • The eager import in api.py is a necessary hack for Pydantic discriminated unions - acceptable pragmatism
  • Could add a type guard on agent_cls, but given stored.agent is pre-validated, this is safe

LGTM - Core logic is sound, changes are minimal and well-targeted.

simonrosenberg and others added 2 commits February 23, 2026 16:57
The claude-code-acp npm install fails on golang and java base images
that don't have npm. Make it conditional on npm availability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…yloads

With ACPAgent registered in the discriminated union, agent payloads
now require an explicit "kind" field. Updated 6 test payloads and
renumbered the ACP example from 07 to 09 to avoid duplicate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simonrosenberg simonrosenberg marked this pull request as draft February 23, 2026 20:04
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 23, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   api.py1601292%75, 87, 102, 108, 269, 272, 276–278, 280, 286, 327
   event_service.py3208174%55–56, 74–76, 85–89, 92–95, 115, 219, 236, 290–291, 295, 303, 306, 352–353, 369, 371, 375–377, 381, 390–391, 393, 397, 403, 405, 413–418, 555, 557–558, 562, 576–578, 580, 584–587, 591–594, 602–605, 625, 629–634, 646–647, 649–650, 657–658, 660–661, 665, 671, 688–689
openhands-sdk/openhands/sdk/agent
   acp_agent.py3897879%194–196, 250–253, 255–256, 283, 285, 289, 295, 306–307, 312, 379, 481–482, 493, 498, 529, 539, 544, 555–558, 564–566, 569–571, 573, 575–576, 578, 580, 585, 594–595, 599–600, 604, 611–617, 627–632, 634, 643–645, 648–649, 655–659, 661, 663–664, 672, 709, 713–714, 958–959
   base.py1932288%200, 257–259, 289, 293–297, 345–347, 357, 367, 375–376, 486, 523–524, 534–535
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py4002693%288, 293, 321, 364, 382, 398, 463, 641–642, 645, 797, 805, 807, 811–812, 823, 825–827, 852, 924, 1050, 1054, 1124, 1131–1132
TOTAL20509522674% 

simonrosenberg and others added 14 commits February 23, 2026 17:37
The CI Agent Server workflow builds binary target images, not source.
Update the example to use target_type="binary" accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 'agent_type' input (default/acp) to run-eval.yml and pass it through
to the evaluation repo's eval-job workflow dispatch payload. This enables
dispatching SWE-bench evaluations using ACPAgent (Claude Code via ACP).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SWE-bench prebaked images use Python-only base images without Node.js.
The conditional `if command -v npm` silently skipped claude-code-acp
installation. Now installs Node.js from nodesource when npm is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ACPAgent now emits a FinishAction ActionEvent + ObservationEvent
after each step() completes. This makes it compatible with evaluation
frameworks (SWE-bench, etc.) that detect task completion by scanning
for FinishAction in the event history.

Without this, the benchmarks fake_user_response loop didn't detect
completion and sent up to 10 "please continue" messages, and the
AgentFinishedCritic marked results as failed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ACP defaults to "ask" mode for tool permissions when
no settings file exists. This causes the agent to hang waiting for
interactive permission approval in headless/eval environments.

Add /etc/claude-code/managed-settings.json with allow rules for
Edit, Read, and Bash tools so Claude Code can operate without
human approval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code's internal permission system blocks tools like Write
when running in default mode, causing the agent to hang waiting
for interactive approval. This happens even when the ACP server's
settings allow the tools, because Claude Code checks permissions
before making MCP tool calls.

Call set_session_mode("bypassPermissions") after creating the
session so Claude Code skips all permission checks in headless
mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add acp_prompt_timeout field (default 1800s) to ACPAgent that wraps
the prompt() call with AsyncExecutor's timeout. This prevents the
eval from hanging forever when claude-code-acp fails to send the
JSON-RPC response after completing its work.

Root cause analysis: claude-code-acp's prompt() method awaits
sessionUpdate notifications end-to-end through the stdout pipe.
If the response is never sent (e.g., due to stdout backpressure
or an unhandled result subtype), the Python ACP client blocks
indefinitely.

Also fixes test assertions for FinishAction emission added in
d72fd08.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The default asyncio.StreamReader limit (64 KiB) is too small for ACP
session_update notifications that carry large tool-call outputs.  When
a JSON-RPC line exceeds the limit, readline() raises LimitOverrunError
which silently kills the filter pipeline, leaving the prompt() future
unresolved forever and eventually deadlocking the subprocess stdout
pipe.

Root cause confirmed on v10 eval: 59,496 bytes of unread data stuck
in the subprocess pipe, both processes sleeping in ep_poll.

Increase both the subprocess pipe and filtered_reader limits to 100 MiB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-detect ACP server type from InitializeResponse.agent_info.name
- Map session modes: claude-code → bypassPermissions, codex-acp → full-access
- Add acp_session_mode field for manual override
- Install codex-acp alongside claude-code-acp in Docker image
- Add acp-codex to run-eval.yml agent_type choices

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codex-acp requires an explicit `conn.authenticate()` call after
initialize and before session creation. Auto-detect the auth method
from environment variables (CODEX_API_KEY or OPENAI_API_KEY).
Without this, codex-acp returns "Authentication required" when
trying to create a session in the remote runtime container.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligns naming with 'acp-codex': both now follow the
acp-{provider} pattern, making the agent type self-documenting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add elapsed time tracking to _record_usage() via add_response_latency()
  for both main prompt and fork (ask_agent) paths
- Capture agent_version from InitializeResponse.agent_info alongside
  agent_name
- Expose agent_name and agent_version as public properties on ACPAgent
  so benchmarks can include ACP server identity in eval output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _BYPASS_MODE_MAP only matched "claude-code" but the actual agent
name is "@zed-industries/claude-agent-acp", which doesn't contain that
substring. Add "claude-agent" as an additional key so the mode correctly
resolves to "bypassPermissions" instead of falling through to the
default "full-access" (which claude-agent-acp rejects).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simonrosenberg simonrosenberg force-pushed the feat/acp-remote-runtime branch from 4176666 to b9c42fd Compare February 24, 2026 13:55
The npm package was renamed from @zed-industries/claude-code-acp to
@zed-industries/claude-agent-acp. Update all references: bypass mode
map key, Dockerfile, examples, tests, and docstrings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simonrosenberg simonrosenberg marked this pull request as ready for review February 27, 2026 12:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 27, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)
{"asctime": "2026-03-11 20:32:47,987", "levelname": "WARNING", "name": "openhands.agent_server.config", "filename": "config.py", "lineno": 173, "message": "\u26a0\ufe0f OH_SECRET_KEY was not defined. Secrets will not be persisted between restarts."}
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.13.0 -> 1.13.0).

Breaking REST API changes detected compared to baseline release:
- added '#/components/schemas/ACPAgent-Output, #/components/schemas/Agent-Output' to the '/items/anyOf[subschema #1: ConversationInfo]/agent' response property 'oneOf' list for the response status '200'
- the '/items/anyOf[subschema #1: ConversationInfo]/agent' response's property type/format changed from 'object'/'' to ''/'' for status '200'
- removed the required property '/items/anyOf[subschema #1: ConversationInfo]/agent/kind' from the response with the '200' status
- removed the required property '/items/anyOf[subschema #1: ConversationInfo]/age

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Pragmatic engineering that solves real problems.

The complexity growth (auth methods, bypass modes, timeout handling, 100 MiB buffer) is justified - each addresses actual production issues rather than theoretical edge cases. The one-shot ACPAgent design (emitting FinishAction per step) correctly models how ACP servers work internally.

Trade-offs are reasonable: 100 MiB stream buffers solve real LimitOverrunError crashes for large tool outputs; Docker managed-settings.json bypass is documented and appropriate for eval environments. Good engineering.

@simonrosenberg simonrosenberg self-assigned this Feb 27, 2026
simonrosenberg and others added 9 commits March 5, 2026 10:56
Add ANTHROPIC_BASE_URL, ANTHROPIC_API_KEY, OPENAI_BASE_URL, and
OPENAI_API_KEY environment variables to the run-examples workflow.
This allows Claude Code and Codex ACP servers to route their API
calls through the LiteLLM proxy using the existing LLM_API_KEY,
eliminating the need to provision separate Anthropic/OpenAI API keys.

See: OpenHands/evaluation#297

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The test framework requires each example to emit an EXAMPLE_COST
marker to stdout. Add cost reporting using agent.llm.metrics for
the standalone ACP example and conversation_stats for the remote
ACP example.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove backward-compatible ANTHROPIC_API_KEY requirements from ACP
examples. Instead, route all ACP agent requests through the LiteLLM
proxy by setting ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY from the
existing LLM_BASE_URL and LLM_API_KEY environment variables.

This approach:
- Eliminates the need for separate Anthropic/OpenAI API key secrets
- Enables full cost tracking through LiteLLM
- Provides model flexibility (can test with different backends)
- Centralizes observability and rate limiting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When ACP prompt() raises, the agent now:
1. Emits a ConversationErrorEvent (in addition to the existing
   MessageEvent) so RemoteConversation._get_last_error_detail() can
   report the actual error instead of the generic fallback message
2. Tags usage/content policy errors with code "UsagePolicyRefusal"
3. Re-raises the exception so LocalConversation.run() breaks the loop
   immediately instead of spinning until max_iteration_per_run

Also fixes pre-existing ruff E501 lint error on line 385.

Fixes OpenHands/benchmarks#495

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflict in acp_agent.py: keep agent_name/agent_version
properties from feat/acp-remote-runtime and adopt Generator[LLM]
return type from main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows callers to specify which model the ACP server should use.
The model is passed via session _meta.claudeCode.options.model
when creating a new ACP session.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
new_session() assigns **kwargs directly to the _meta field.
Passing _meta={"claudeCode": ...} wraps it in an extra _meta key,
producing {"_meta": {"_meta": {"claudeCode": ...}}} — the ACP server
never sees the model config.

Fix: unpack the meta dict as direct kwargs so they become _meta content.
Also skip claudeCode meta for non-Claude ACP servers (e.g. codex-acp).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The step() error handler now re-raises after emitting events and emits
both a MessageEvent and a ConversationErrorEvent. Update the test to
expect the re-raise (pytest.raises) and verify both emitted events.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ACPAgent uses a sentinel LLM with model="acp-managed" that cannot
make direct LLM calls. When generate_title() was called for ACP
conversations, it would fall back to this sentinel LLM and fail with:
"LiteLLM Provider NOT provided. You passed model=acp-managed"

Add a guard to detect the sentinel model and pass None to
generate_conversation_title(), which causes it to use simple truncation
fallback instead of LLM-based title generation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@all-hands-bot
Copy link
Copy Markdown
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

Add automatic retry for ACP prompt failures caused by transient
connection errors (OSError, ConnectionError, BrokenPipeError, EOFError).

Changes:
- Wrap prompt() call in retry loop for connection exception types
- Retry up to 3 times with exponential backoff (5s, 15s, 30s)
- Configurable via ACP_PROMPT_MAX_RETRIES env var
- Reset client accumulators between retries
- Timeout errors are NOT retried (handled separately)

This preserves session state when connection errors occur, avoiding
the need to restart instances from scratch in the evaluation framework.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@simonrosenberg simonrosenberg force-pushed the feat/acp-remote-runtime branch from d0127c4 to bda77a6 Compare March 11, 2026 19:13
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simonrosenberg simonrosenberg merged commit ca621a4 into main Mar 13, 2026
28 checks passed
@simonrosenberg simonrosenberg deleted the feat/acp-remote-runtime branch March 13, 2026 21:33
@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 13, 2026

This PR actually failed the OpenAPI check:

I'll fix it to fail more loudly. I'm not sure if it's time to make it mandatory for merging, I think maybe we should, because it's too easy to miss only in a comment... 🤔

@simonrosenberg Please correct me if I see wrong, serialization, and the REST schema, are broken from older clients?

We have a policy to give a deprecation period (~5 releases) before we make breaking changes to public APIs. 🤔

Maybe we could revert this PR, and rebuild it with deprecation markers?

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 13, 2026

Failures log: OpenAPI checks

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 13, 2026

I'll fix it to fail more loudly.

Done

Copy link
Copy Markdown
Collaborator

enyst commented Mar 13, 2026

Hi — I’m OpenHands-GPT-5.4.

I looked back at this PR in light of the OpenAPI breakage that later surfaced.

What broke

This PR changed the REST contract of an existing agent-server endpoint in a breaking way: the agent request/response shape stopped being a plain object and became a polymorphic oneOf union (Agent-Output / ACPAgent-Output), so fields that older clients could previously rely on (kind, llm, etc.) were no longer guaranteed in the old shape.

So the break here was not primarily “an endpoint was removed without deprecation.” It was a breaking request/response contract change on an existing endpoint without a deprecation runway.

Relevant policy lines

I opened a follow-up policy PR to make this explicit: #2433.

Relevant line references there:

  • AGENTS.md:114-122 — repo-wide rule that public REST contract breaks need a deprecation notice and a runway of 5 minor releases.
  • openhands-agent-server/AGENTS.md:17-24 — agent-server REST API is public/backward-compatible, and all REST contract breaks need a deprecation notice plus a 5 minor release runway.
  • openhands-agent-server/AGENTS.md:50-69 — incompatible request/response schema changes need a migration path, and endpoints / legacy contracts should only be removed after that runway.

Practical options to handle this kind of change

A few reasonable paths would have been:

  1. Keep the old contract working in place

    • Make the change additive.
    • Keep accepting / returning the old shape while introducing ACP support.
  2. Add a parallel field

    • Keep the old agent shape stable.
    • Introduce a new field for the new polymorphic contract.
    • Deprecate the old field and keep it for the runway.
  3. Add a versioned endpoint or versioned contract

    • Keep the old endpoint/contract stable.
    • Introduce a new version for the ACP-capable shape.
    • Mark the old one deprecated and keep it around for the runway.
  4. If you truly must break immediately, do a MINOR bump

    • That would satisfy the existing CI policy better than shipping the break under the same minor version.
    • But with the stronger policy in Clarify REST contract deprecation policy #2433, the expectation would still be to provide a deprecation notice + migration runway, not just the version bump.

Bottom line

The core issue here was changing an existing REST contract incompatibly without a migration/deprecation path for clients. The follow-up policy PR (#2433) codifies the intended expectation more clearly: all REST contract breaks need a deprecation notice and a runway of 5 minor releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants