Skip to content

fix(acp): derive vision capability from ACP provider, not sentinel LLM#2879

Closed
simonrosenberg wants to merge 6 commits intomainfrom
fix/acp-vision-capability-check
Closed

fix(acp): derive vision capability from ACP provider, not sentinel LLM#2879
simonrosenberg wants to merge 6 commits intomainfrom
fix/acp-vision-capability-check

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented Apr 18, 2026

Summary

When OpenHands runs an ACPAgent, the sentinel LLM(model="acp-managed") is unknown to LiteLLM, so agent.llm.vision_is_active() returns False. Tools that gate behaviour on vision support — notably FileEditorTool, which hides its image-view affordance — therefore never advertised image reading even when the wrapped ACP server forwards images to a vision-capable model (all three providers OpenHands supports do).

This PR adds the missing abstraction and relies on #2881 (already merged) to do the real work:

  1. AgentBase.supports_vision() — new method that delegates to self.llm.vision_is_active(). A clean hook for tools and for future agent subclasses.
  2. FileEditorTool.create — switches from conv_state.agent.llm.vision_is_active() to conv_state.agent.supports_vision().

No ACPAgent override is needed: #2881 already writes acp_model into self.llm.model in model_post_init(), so the base-class delegation lands on the real model name and LiteLLM resolves the capability correctly. A non-vision acp_model correctly reports no-vision.

Test plan

  • uv run pytest tests/sdk/agent/test_acp_agent.py tests/tools/file_editor/test_file_editor_tool.py — 169 passed
  • New tests: TestACPAgentSupportsVision covers both a vision-capable and a non-vision acp_model; test_file_editor_tool_image_viewing_line_for_acp_agent covers the end-to-end path.

Part of #2471.

🤖 Generated with Claude Code


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:992a671-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-992a671-python \
  ghcr.io/openhands/agent-server:992a671-python

All tags pushed for this build

ghcr.io/openhands/agent-server:992a671-golang-amd64
ghcr.io/openhands/agent-server:992a671-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:992a671-golang-arm64
ghcr.io/openhands/agent-server:992a671-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:992a671-java-amd64
ghcr.io/openhands/agent-server:992a671-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:992a671-java-arm64
ghcr.io/openhands/agent-server:992a671-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:992a671-python-amd64
ghcr.io/openhands/agent-server:992a671-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:992a671-python-arm64
ghcr.io/openhands/agent-server:992a671-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:992a671-golang
ghcr.io/openhands/agent-server:992a671-java
ghcr.io/openhands/agent-server:992a671-python

About Multi-Architecture Support

  • Each variant tag (e.g., 992a671-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 992a671-python-amd64) are also available if needed

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean, pragmatic solution to a real bug.

VERDICT: ✅ Worth merging after human validation of eval impact

KEY INSIGHT: Elegant use of the override pattern to resolve vision capability from ACP provider identity instead of the sentinel LLM model. The safe-by-default fallback (unknown servers → False) prevents silently advertising capabilities that may not exist.

This fixes a legitimate bug where FileEditorTool couldn't detect vision support for ACP agents. Code quality is excellent with comprehensive test coverage for all three providers, edge cases, and integration with FileEditorTool.

[EVAL RISK NOTE]
This changes the FileEditorTool description for vision-capable ACP agents, which could plausibly affect agent decision-making. While it's a bug fix (correcting capability advertisement rather than adding new behavior), I'm flagging it per the review guidelines for a human maintainer to validate there's no unexpected benchmark impact before merging.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

Changes tool behavior (description advertises image viewing for vision-capable ACP agents). Well-tested and scoped as a bug fix, but the tool description change could plausibly affect agent tool selection or usage patterns. Recommend quick human review to confirm no unexpected eval impact, then merge.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified ACPAgent correctly derives vision capability from ACP provider identity instead of the unrecognized sentinel LLM.

Does this PR achieve its stated goal?

Yes. The PR successfully fixes the vision capability detection for ACPAgent. Previously, ACPAgent's sentinel LLM (model="acp-managed") was unknown to LiteLLM, causing llm.vision_is_active() to always return False — even when the wrapped ACP server (like claude-agent-acp or gemini-cli) forwards images to a vision-capable model. The fix adds AgentBase.supports_vision() with a default delegation to llm.vision_is_active(), and overrides it in ACPAgent to resolve from the ACP provider identity. FileEditorTool now correctly shows image-viewing capabilities for known vision-capable ACP providers while maintaining safe defaults for unknown servers and preserving backward compatibility for regular agents.

Phase Result
Environment Setup ✅ All dependencies installed, project builds successfully
CI & Tests ✅ 158 ACP agent tests pass, 21 FileEditorTool tests pass, 12 CI checks successful
Functional Verification ✅ Before/after testing confirms bug fix and correct behavior
Functional Verification

Test 1: Reproduce the Bug (Baseline on main branch)

Step 1 — Establish baseline (without the fix):

Checked out main branch and ran test script to verify vision capability detection:

git checkout main
uv run python /tmp/test_vision_capability.py

Output for Claude Agent ACP:

Agent: ACPAgent
ACP Command: ['npx', '-y', '@agentclientprotocol/claude-agent-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): [method does not exist on main branch]
FileEditorTool includes image viewing: False
❌ FAIL: FileEditorTool missing image viewing capability

Output for Gemini CLI:

Agent: ACPAgent
ACP Command: ['gemini-cli', '--experimental-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): [method does not exist on main branch]
FileEditorTool includes image viewing: False
❌ FAIL: FileEditorTool missing image viewing capability

Interpretation: This confirms the bug exists. The sentinel LLM model "acp-managed" is unknown to LiteLLM, so vision_is_active() returns False. As a result, FileEditorTool hides the image-viewing affordance even though claude-agent-acp and gemini-cli both support vision through their underlying models.


Test 2: Verify the Fix (PR branch)

Step 2 — Apply the PR's changes:

Checked out the PR branch:

git checkout pr-2879

Step 3 — Re-run with the fix in place:

Ran the same test script:

uv run python /tmp/test_vision_capability.py

Output for Claude Agent ACP:

Agent: ACPAgent
ACP Command: ['npx', '-y', '@agentclientprotocol/claude-agent-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing capability

Output for Gemini CLI:

Agent: ACPAgent
ACP Command: ['gemini-cli', '--experimental-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing capability

Output for Unknown ACP Server:

Agent: ACPAgent
ACP Command: ['echo', 'test']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): False
FileEditorTool includes image viewing: False
✅ PASS: FileEditorTool correctly hides image viewing for unknown server

Interpretation: The fix works correctly:

  • agent.supports_vision() now exists and returns True for known vision-capable ACP providers (claude-agent-acp, gemini-cli)
  • FileEditorTool now includes the image-viewing affordance for these providers
  • Unknown ACP servers correctly fall back to False (safe default)
  • The sentinel LLM still returns False for vision_is_active(), but the ACPAgent override correctly resolves from the provider identity

Test 3: Verify Regular Agents Still Work

Purpose: Ensure the default supports_vision() implementation correctly delegates to llm.vision_is_active() for non-ACP agents.

Test with vision-capable model (gpt-4o):

Agent: Agent
LLM model: openai/gpt-4o
llm.vision_is_active(): True
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing for vision model

Test with non-vision model (gpt-3.5-turbo):

Agent: Agent
LLM model: openai/gpt-3.5-turbo
llm.vision_is_active(): False
agent.supports_vision(): False
FileEditorTool includes image viewing: False
✅ PASS: FileEditorTool correctly hides image viewing for non-vision model

Interpretation: Regular agents maintain backward compatibility. The default implementation of supports_vision() correctly delegates to llm.vision_is_active(), preserving existing behavior for all non-ACP agents.


Test 4: Unit Tests

ACP Agent Tests:

uv run pytest tests/sdk/agent/test_acp_agent.py -v

Result: 158 passed in 6.82s

New tests cover:

  • Per-provider command detection (claude-agent-acp, gemini-cli, codex-acp)
  • Agent-name detection from InitializeResponse
  • Agent-name precedence over generic command
  • Unknown-server fallback to False
  • ACPAgent override vs sentinel LLM

FileEditorTool Tests:

uv run pytest tests/tools/file_editor/test_file_editor_tool.py -v

Result: 21 passed in 0.10s

New test test_file_editor_tool_image_viewing_line_for_acp_agent verifies end-to-end integration with ACPAgent.

Issues Found

None.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 18, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   base.py2021791%182, 248, 325–327, 378–380, 404, 414, 422–423, 527, 564–565, 575–576
openhands-tools/openhands/tools/file_editor
   definition.py721086%103–104, 107, 129, 132, 135, 142, 144, 146, 148
TOTAL23295675371% 

@simonrosenberg simonrosenberg self-assigned this Apr 18, 2026
@simonrosenberg simonrosenberg enabled auto-merge (squash) April 18, 2026 20:34
@simonrosenberg simonrosenberg force-pushed the fix/acp-vision-capability-check branch from feed166 to 2d2d088 Compare April 18, 2026 20:58
Debug Agent and others added 4 commits April 18, 2026 18:20
ACPAgent's sentinel ``LLM(model="acp-managed")`` is unknown to LiteLLM,
so ``llm.vision_is_active()`` always returned False — causing
FileEditorTool to hide the image-view affordance even when the wrapped
ACP server forwards images to a vision-capable model.

Add ``AgentBase.supports_vision()`` (defaults to ``llm.vision_is_active()``
for regular agents) and override it on ACPAgent to resolve from the ACP
provider identity. All three providers we support declare
``prompt_capabilities.image=true`` and wire images to vision models:
- claude-agent-acp → Claude Sonnet/Opus/Haiku 3.5+
- gemini-cli      → Gemini 1.5/2.x Pro/Flash
- codex-acp       → GPT-5 family via Codex

Unknown ACP servers fall back to False so we don't silently send image
content that may be dropped.

Part of #2471.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Only two mentions of "claude-code-acp" exist in-tree, both in comments
about the v0.1.x stdout-log quirk — nobody installs or launches a
package by that name today. Keep the map to actually-shipping servers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Map values were all True and the helper's only caller was
ACPAgent.supports_vision(); drop the wrapper and check substring
membership inline. Replaces the provider→bool dict with a plain
tuple of known vision-capable ACP server markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonrosenberg simonrosenberg force-pushed the fix/acp-vision-capability-check branch from 2d2d088 to 1590e6f Compare April 18, 2026 21:20
Debug Agent and others added 2 commits April 18, 2026 18:23
Now that ACPAgent.model_post_init() writes acp_model into llm.model
(merged in #2881), llm.vision_is_active() gives an authoritative
answer for explicitly-configured models via LiteLLM. Use it when the
user has set acp_model — even if they pick a non-vision variant,
that choice wins over the server-identity default.

The provider-identity fallback only kicks in when acp_model is unset
and we can't know which default model the server will pick; for all
three providers we support, that default is vision-capable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With ACPAgent.model_post_init() writing acp_model into llm.model,
the base-class AgentBase.supports_vision() (which delegates to
llm.vision_is_active()) already returns the right answer for any
ACP agent configured with an explicit model — which is how all
OpenHands ACP deployments run in practice.

Remove the subclass override, the _VISION_CAPABLE_ACP_SERVERS
marker tuple, and the provider-identity fallback tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

Closing as unnecessary.

After #2881 landed, ACPAgent.model_post_init() writes acp_model into self.llm.model before any tool runs. That makes the existing conv_state.agent.llm.vision_is_active() call in FileEditorTool.create already return the right answer via LiteLLM — no abstraction or override needed.

Verified manually against every current ACP model:

Model litellm.supports_vision llm.vision_is_active()
claude-sonnet-4-5
claude-opus-4-6
claude-haiku-4-5
gemini-2.5-pro
gemini-3-pro-preview
gpt-5 / gpt-5.1 / gpt-5.2-codex / gpt-5.4
gpt-3.5-turbo (non-vision) ❌ (correct)
acp-managed (sentinel, acp_model=None) ❌ (conservative)

The AgentBase.supports_vision() method added here would be a one-line delegation with no caller that benefits from it. Dropping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants