fix(acp): derive vision capability from ACP provider, not sentinel LLM by simonrosenberg · Pull Request #2879 · OpenHands/software-agent-sdk

simonrosenberg · 2026-04-18T20:05:20Z

Summary

When OpenHands runs an ACPAgent, the sentinel LLM(model="acp-managed") is unknown to LiteLLM, so agent.llm.vision_is_active() returns False. Tools that gate behaviour on vision support — notably FileEditorTool, which hides its image-view affordance — therefore never advertised image reading even when the wrapped ACP server forwards images to a vision-capable model (all three providers OpenHands supports do).

This PR adds the missing abstraction and relies on #2881 (already merged) to do the real work:

AgentBase.supports_vision() — new method that delegates to self.llm.vision_is_active(). A clean hook for tools and for future agent subclasses.
FileEditorTool.create — switches from conv_state.agent.llm.vision_is_active() to conv_state.agent.supports_vision().

No ACPAgent override is needed: #2881 already writes acp_model into self.llm.model in model_post_init(), so the base-class delegation lands on the real model name and LiteLLM resolves the capability correctly. A non-vision acp_model correctly reports no-vision.

Test plan

uv run pytest tests/sdk/agent/test_acp_agent.py tests/tools/file_editor/test_file_editor_tool.py — 169 passed
New tests: TestACPAgentSupportsVision covers both a vision-capable and a non-vision acp_model; test_file_editor_tool_image_viewing_line_for_acp_agent covers the end-to-end path.

Part of #2471.

🤖 Generated with Claude Code

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:992a671-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-992a671-python \
  ghcr.io/openhands/agent-server:992a671-python

All tags pushed for this build

ghcr.io/openhands/agent-server:992a671-golang-amd64
ghcr.io/openhands/agent-server:992a671-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:992a671-golang-arm64
ghcr.io/openhands/agent-server:992a671-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:992a671-java-amd64
ghcr.io/openhands/agent-server:992a671-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:992a671-java-arm64
ghcr.io/openhands/agent-server:992a671-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:992a671-python-amd64
ghcr.io/openhands/agent-server:992a671-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:992a671-python-arm64
ghcr.io/openhands/agent-server:992a671-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:992a671-golang
ghcr.io/openhands/agent-server:992a671-java
ghcr.io/openhands/agent-server:992a671-python

About Multi-Architecture Support

Each variant tag (e.g., 992a671-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 992a671-python-amd64) are also available if needed

github-actions · 2026-04-18T20:05:46Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-04-18T20:05:54Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

all-hands-bot

🟢 Good taste - Clean, pragmatic solution to a real bug.

VERDICT: ✅ Worth merging after human validation of eval impact

KEY INSIGHT: Elegant use of the override pattern to resolve vision capability from ACP provider identity instead of the sentinel LLM model. The safe-by-default fallback (unknown servers → False) prevents silently advertising capabilities that may not exist.

This fixes a legitimate bug where FileEditorTool couldn't detect vision support for ACP agents. Code quality is excellent with comprehensive test coverage for all three providers, edge cases, and integration with FileEditorTool.

[EVAL RISK NOTE]
This changes the FileEditorTool description for vision-capable ACP agents, which could plausibly affect agent decision-making. While it's a bug fix (correcting capability advertisement rather than adding new behavior), I'm flagging it per the review guidelines for a human maintainer to validate there's no unexpected benchmark impact before merging.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

Changes tool behavior (description advertises image viewing for vision-capable ACP agents). Well-tested and scoped as a bug fix, but the tool description change could plausibly affect agent tool selection or usage patterns. Recommend quick human review to confirm no unexpected eval impact, then merge.

all-hands-bot

✅ QA Report: PASS

Verified ACPAgent correctly derives vision capability from ACP provider identity instead of the unrecognized sentinel LLM.

Does this PR achieve its stated goal?

Yes. The PR successfully fixes the vision capability detection for ACPAgent. Previously, ACPAgent's sentinel LLM (model="acp-managed") was unknown to LiteLLM, causing llm.vision_is_active() to always return False — even when the wrapped ACP server (like claude-agent-acp or gemini-cli) forwards images to a vision-capable model. The fix adds AgentBase.supports_vision() with a default delegation to llm.vision_is_active(), and overrides it in ACPAgent to resolve from the ACP provider identity. FileEditorTool now correctly shows image-viewing capabilities for known vision-capable ACP providers while maintaining safe defaults for unknown servers and preserving backward compatibility for regular agents.

Phase	Result
Environment Setup	✅ All dependencies installed, project builds successfully
CI & Tests	✅ 158 ACP agent tests pass, 21 FileEditorTool tests pass, 12 CI checks successful
Functional Verification	✅ Before/after testing confirms bug fix and correct behavior

Functional Verification

Test 1: Reproduce the Bug (Baseline on `main` branch)

Step 1 — Establish baseline (without the fix):

Checked out main branch and ran test script to verify vision capability detection:

git checkout main
uv run python /tmp/test_vision_capability.py

Output for Claude Agent ACP:

Agent: ACPAgent
ACP Command: ['npx', '-y', '@agentclientprotocol/claude-agent-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): [method does not exist on main branch]
FileEditorTool includes image viewing: False
❌ FAIL: FileEditorTool missing image viewing capability

Output for Gemini CLI:

Agent: ACPAgent
ACP Command: ['gemini-cli', '--experimental-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): [method does not exist on main branch]
FileEditorTool includes image viewing: False
❌ FAIL: FileEditorTool missing image viewing capability

Interpretation: This confirms the bug exists. The sentinel LLM model "acp-managed" is unknown to LiteLLM, so vision_is_active() returns False. As a result, FileEditorTool hides the image-viewing affordance even though claude-agent-acp and gemini-cli both support vision through their underlying models.

Test 2: Verify the Fix (PR branch)

Step 2 — Apply the PR's changes:

Checked out the PR branch:

git checkout pr-2879

Step 3 — Re-run with the fix in place:

Ran the same test script:

uv run python /tmp/test_vision_capability.py

Output for Claude Agent ACP:

Agent: ACPAgent
ACP Command: ['npx', '-y', '@agentclientprotocol/claude-agent-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing capability

Output for Gemini CLI:

Agent: ACPAgent
ACP Command: ['gemini-cli', '--experimental-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing capability

Output for Unknown ACP Server:

Agent: ACPAgent
ACP Command: ['echo', 'test']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): False
FileEditorTool includes image viewing: False
✅ PASS: FileEditorTool correctly hides image viewing for unknown server

Interpretation: The fix works correctly:

agent.supports_vision() now exists and returns True for known vision-capable ACP providers (claude-agent-acp, gemini-cli)
FileEditorTool now includes the image-viewing affordance for these providers
Unknown ACP servers correctly fall back to False (safe default)
The sentinel LLM still returns False for vision_is_active(), but the ACPAgent override correctly resolves from the provider identity

Test 3: Verify Regular Agents Still Work

Purpose: Ensure the default supports_vision() implementation correctly delegates to llm.vision_is_active() for non-ACP agents.

Test with vision-capable model (gpt-4o):

Agent: Agent
LLM model: openai/gpt-4o
llm.vision_is_active(): True
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing for vision model

Test with non-vision model (gpt-3.5-turbo):

Agent: Agent
LLM model: openai/gpt-3.5-turbo
llm.vision_is_active(): False
agent.supports_vision(): False
FileEditorTool includes image viewing: False
✅ PASS: FileEditorTool correctly hides image viewing for non-vision model

Interpretation: Regular agents maintain backward compatibility. The default implementation of supports_vision() correctly delegates to llm.vision_is_active(), preserving existing behavior for all non-ACP agents.

Test 4: Unit Tests

ACP Agent Tests:

uv run pytest tests/sdk/agent/test_acp_agent.py -v

Result: 158 passed in 6.82s

New tests cover:

Per-provider command detection (claude-agent-acp, gemini-cli, codex-acp)
Agent-name detection from InitializeResponse
Agent-name precedence over generic command
Unknown-server fallback to False
ACPAgent override vs sentinel LLM

FileEditorTool Tests:

uv run pytest tests/tools/file_editor/test_file_editor_tool.py -v

Result: 21 passed in 0.10s

New test test_file_editor_tool_image_viewing_line_for_acp_agent verifies end-to-end integration with ACPAgent.

Issues Found

None.

github-actions · 2026-04-18T20:12:46Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/agent
base.py	202	17	91%	182, 248, 325–327, 378–380, 404, 414, 422–423, 527, 564–565, 575–576
openhands-tools/openhands/tools/file_editor
definition.py	72	10	86%	103–104, 107, 129, 132, 135, 142, 144, 146, 148
TOTAL	23295	6753	71%

ACPAgent's sentinel ``LLM(model="acp-managed")`` is unknown to LiteLLM, so ``llm.vision_is_active()`` always returned False — causing FileEditorTool to hide the image-view affordance even when the wrapped ACP server forwards images to a vision-capable model. Add ``AgentBase.supports_vision()`` (defaults to ``llm.vision_is_active()`` for regular agents) and override it on ACPAgent to resolve from the ACP provider identity. All three providers we support declare ``prompt_capabilities.image=true`` and wire images to vision models: - claude-agent-acp → Claude Sonnet/Opus/Haiku 3.5+ - gemini-cli → Gemini 1.5/2.x Pro/Flash - codex-acp → GPT-5 family via Codex Unknown ACP servers fall back to False so we don't silently send image content that may be dropped. Part of #2471. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Only two mentions of "claude-code-acp" exist in-tree, both in comments about the v0.1.x stdout-log quirk — nobody installs or launches a package by that name today. Keep the map to actually-shipping servers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Map values were all True and the helper's only caller was ACPAgent.supports_vision(); drop the wrapper and check substring membership inline. Replaces the provider→bool dict with a plain tuple of known vision-capable ACP server markers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Now that ACPAgent.model_post_init() writes acp_model into llm.model (merged in #2881), llm.vision_is_active() gives an authoritative answer for explicitly-configured models via LiteLLM. Use it when the user has set acp_model — even if they pick a non-vision variant, that choice wins over the server-identity default. The provider-identity fallback only kicks in when acp_model is unset and we can't know which default model the server will pick; for all three providers we support, that default is vision-capable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

With ACPAgent.model_post_init() writing acp_model into llm.model, the base-class AgentBase.supports_vision() (which delegates to llm.vision_is_active()) already returns the right answer for any ACP agent configured with an explicit model — which is how all OpenHands ACP deployments run in practice. Remove the subclass override, the _VISION_CAPABLE_ACP_SERVERS marker tuple, and the provider-identity fallback tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

simonrosenberg · 2026-04-18T21:34:21Z

Closing as unnecessary.

After #2881 landed, ACPAgent.model_post_init() writes acp_model into self.llm.model before any tool runs. That makes the existing conv_state.agent.llm.vision_is_active() call in FileEditorTool.create already return the right answer via LiteLLM — no abstraction or override needed.

Verified manually against every current ACP model:

Model	`litellm.supports_vision`	`llm.vision_is_active()`
claude-sonnet-4-5	✅	✅
claude-opus-4-6	✅	✅
claude-haiku-4-5	✅	✅
gemini-2.5-pro	✅	✅
gemini-3-pro-preview	✅	✅
gpt-5 / gpt-5.1 / gpt-5.2-codex / gpt-5.4	✅	✅
gpt-3.5-turbo (non-vision)	❌	❌ (correct)
acp-managed (sentinel, acp_model=None)	❌	❌ (conservative)

The AgentBase.supports_vision() method added here would be a one-line delegation with no caller that benefits from it. Dropping.

all-hands-bot reviewed Apr 18, 2026

View reviewed changes

simonrosenberg self-assigned this Apr 18, 2026

simonrosenberg enabled auto-merge (squash) April 18, 2026 20:34

simonrosenberg force-pushed the fix/acp-vision-capability-check branch from feed166 to 2d2d088 Compare April 18, 2026 20:58

Debug Agent and others added 4 commits April 18, 2026 18:20

chore(file_editor): trim verbose comment on supports_vision call

1590e6f

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

simonrosenberg force-pushed the fix/acp-vision-capability-check branch from 2d2d088 to 1590e6f Compare April 18, 2026 21:20

Debug Agent and others added 2 commits April 18, 2026 18:23

simonrosenberg closed this Apr 18, 2026

auto-merge was automatically disabled April 18, 2026 21:34
Pull request was closed

simonrosenberg mentioned this pull request Apr 18, 2026

ACP agent cannot use LLM for auxiliary tasks (title generation, vision check, sub-agents, etc.) #2471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(acp): derive vision capability from ACP provider, not sentinel LLM#2879

fix(acp): derive vision capability from ACP provider, not sentinel LLM#2879
simonrosenberg wants to merge 6 commits intomainfrom
fix/acp-vision-capability-check

simonrosenberg commented Apr 18, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot left a comment

Uh oh!

github-actions bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

simonrosenberg commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonrosenberg commented Apr 18, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Reproduce the Bug (Baseline on main branch)

Test 2: Verify the Fix (PR branch)

Test 3: Verify Regular Agents Still Work

Test 4: Unit Tests

Issues Found

Uh oh!

github-actions bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonrosenberg commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonrosenberg commented Apr 18, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Apr 18, 2026 •

edited

Loading

github-actions bot commented Apr 18, 2026 •

edited

Loading

Test 1: Reproduce the Bug (Baseline on `main` branch)

github-actions bot commented Apr 18, 2026 •

edited

Loading