fix(acp): derive vision capability from ACP provider, not sentinel LLM#2879
fix(acp): derive vision capability from ACP provider, not sentinel LLM#2879simonrosenberg wants to merge 6 commits intomainfrom
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean, pragmatic solution to a real bug.
VERDICT: ✅ Worth merging after human validation of eval impact
KEY INSIGHT: Elegant use of the override pattern to resolve vision capability from ACP provider identity instead of the sentinel LLM model. The safe-by-default fallback (unknown servers → False) prevents silently advertising capabilities that may not exist.
This fixes a legitimate bug where FileEditorTool couldn't detect vision support for ACP agents. Code quality is excellent with comprehensive test coverage for all three providers, edge cases, and integration with FileEditorTool.
[EVAL RISK NOTE]
This changes the FileEditorTool description for vision-capable ACP agents, which could plausibly affect agent decision-making. While it's a bug fix (correcting capability advertisement rather than adding new behavior), I'm flagging it per the review guidelines for a human maintainer to validate there's no unexpected benchmark impact before merging.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
Changes tool behavior (description advertises image viewing for vision-capable ACP agents). Well-tested and scoped as a bug fix, but the tool description change could plausibly affect agent tool selection or usage patterns. Recommend quick human review to confirm no unexpected eval impact, then merge.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified ACPAgent correctly derives vision capability from ACP provider identity instead of the unrecognized sentinel LLM.
Does this PR achieve its stated goal?
Yes. The PR successfully fixes the vision capability detection for ACPAgent. Previously, ACPAgent's sentinel LLM (model="acp-managed") was unknown to LiteLLM, causing llm.vision_is_active() to always return False — even when the wrapped ACP server (like claude-agent-acp or gemini-cli) forwards images to a vision-capable model. The fix adds AgentBase.supports_vision() with a default delegation to llm.vision_is_active(), and overrides it in ACPAgent to resolve from the ACP provider identity. FileEditorTool now correctly shows image-viewing capabilities for known vision-capable ACP providers while maintaining safe defaults for unknown servers and preserving backward compatibility for regular agents.
| Phase | Result |
|---|---|
| Environment Setup | ✅ All dependencies installed, project builds successfully |
| CI & Tests | ✅ 158 ACP agent tests pass, 21 FileEditorTool tests pass, 12 CI checks successful |
| Functional Verification | ✅ Before/after testing confirms bug fix and correct behavior |
Functional Verification
Test 1: Reproduce the Bug (Baseline on main branch)
Step 1 — Establish baseline (without the fix):
Checked out main branch and ran test script to verify vision capability detection:
git checkout main
uv run python /tmp/test_vision_capability.pyOutput for Claude Agent ACP:
Agent: ACPAgent
ACP Command: ['npx', '-y', '@agentclientprotocol/claude-agent-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): [method does not exist on main branch]
FileEditorTool includes image viewing: False
❌ FAIL: FileEditorTool missing image viewing capability
Output for Gemini CLI:
Agent: ACPAgent
ACP Command: ['gemini-cli', '--experimental-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): [method does not exist on main branch]
FileEditorTool includes image viewing: False
❌ FAIL: FileEditorTool missing image viewing capability
Interpretation: This confirms the bug exists. The sentinel LLM model "acp-managed" is unknown to LiteLLM, so vision_is_active() returns False. As a result, FileEditorTool hides the image-viewing affordance even though claude-agent-acp and gemini-cli both support vision through their underlying models.
Test 2: Verify the Fix (PR branch)
Step 2 — Apply the PR's changes:
Checked out the PR branch:
git checkout pr-2879Step 3 — Re-run with the fix in place:
Ran the same test script:
uv run python /tmp/test_vision_capability.pyOutput for Claude Agent ACP:
Agent: ACPAgent
ACP Command: ['npx', '-y', '@agentclientprotocol/claude-agent-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing capability
Output for Gemini CLI:
Agent: ACPAgent
ACP Command: ['gemini-cli', '--experimental-acp']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing capability
Output for Unknown ACP Server:
Agent: ACPAgent
ACP Command: ['echo', 'test']
Sentinel LLM model: acp-managed
llm.vision_is_active(): False
agent.supports_vision(): False
FileEditorTool includes image viewing: False
✅ PASS: FileEditorTool correctly hides image viewing for unknown server
Interpretation: The fix works correctly:
agent.supports_vision()now exists and returnsTruefor known vision-capable ACP providers (claude-agent-acp, gemini-cli)- FileEditorTool now includes the image-viewing affordance for these providers
- Unknown ACP servers correctly fall back to
False(safe default) - The sentinel LLM still returns
Falseforvision_is_active(), but the ACPAgent override correctly resolves from the provider identity
Test 3: Verify Regular Agents Still Work
Purpose: Ensure the default supports_vision() implementation correctly delegates to llm.vision_is_active() for non-ACP agents.
Test with vision-capable model (gpt-4o):
Agent: Agent
LLM model: openai/gpt-4o
llm.vision_is_active(): True
agent.supports_vision(): True
FileEditorTool includes image viewing: True
✅ PASS: FileEditorTool correctly shows image viewing for vision model
Test with non-vision model (gpt-3.5-turbo):
Agent: Agent
LLM model: openai/gpt-3.5-turbo
llm.vision_is_active(): False
agent.supports_vision(): False
FileEditorTool includes image viewing: False
✅ PASS: FileEditorTool correctly hides image viewing for non-vision model
Interpretation: Regular agents maintain backward compatibility. The default implementation of supports_vision() correctly delegates to llm.vision_is_active(), preserving existing behavior for all non-ACP agents.
Test 4: Unit Tests
ACP Agent Tests:
uv run pytest tests/sdk/agent/test_acp_agent.py -vResult: 158 passed in 6.82s
New tests cover:
- Per-provider command detection (claude-agent-acp, gemini-cli, codex-acp)
- Agent-name detection from InitializeResponse
- Agent-name precedence over generic command
- Unknown-server fallback to False
- ACPAgent override vs sentinel LLM
FileEditorTool Tests:
uv run pytest tests/tools/file_editor/test_file_editor_tool.py -vResult: 21 passed in 0.10s
New test test_file_editor_tool_image_viewing_line_for_acp_agent verifies end-to-end integration with ACPAgent.
Issues Found
None.
feed166 to
2d2d088
Compare
ACPAgent's sentinel ``LLM(model="acp-managed")`` is unknown to LiteLLM, so ``llm.vision_is_active()`` always returned False — causing FileEditorTool to hide the image-view affordance even when the wrapped ACP server forwards images to a vision-capable model. Add ``AgentBase.supports_vision()`` (defaults to ``llm.vision_is_active()`` for regular agents) and override it on ACPAgent to resolve from the ACP provider identity. All three providers we support declare ``prompt_capabilities.image=true`` and wire images to vision models: - claude-agent-acp → Claude Sonnet/Opus/Haiku 3.5+ - gemini-cli → Gemini 1.5/2.x Pro/Flash - codex-acp → GPT-5 family via Codex Unknown ACP servers fall back to False so we don't silently send image content that may be dropped. Part of #2471. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Only two mentions of "claude-code-acp" exist in-tree, both in comments about the v0.1.x stdout-log quirk — nobody installs or launches a package by that name today. Keep the map to actually-shipping servers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Map values were all True and the helper's only caller was ACPAgent.supports_vision(); drop the wrapper and check substring membership inline. Replaces the provider→bool dict with a plain tuple of known vision-capable ACP server markers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2d2d088 to
1590e6f
Compare
Now that ACPAgent.model_post_init() writes acp_model into llm.model (merged in #2881), llm.vision_is_active() gives an authoritative answer for explicitly-configured models via LiteLLM. Use it when the user has set acp_model — even if they pick a non-vision variant, that choice wins over the server-identity default. The provider-identity fallback only kicks in when acp_model is unset and we can't know which default model the server will pick; for all three providers we support, that default is vision-capable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With ACPAgent.model_post_init() writing acp_model into llm.model, the base-class AgentBase.supports_vision() (which delegates to llm.vision_is_active()) already returns the right answer for any ACP agent configured with an explicit model — which is how all OpenHands ACP deployments run in practice. Remove the subclass override, the _VISION_CAPABLE_ACP_SERVERS marker tuple, and the provider-identity fallback tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Closing as unnecessary. After #2881 landed, Verified manually against every current ACP model:
The |
Summary
When OpenHands runs an ACPAgent, the sentinel
LLM(model="acp-managed")is unknown to LiteLLM, soagent.llm.vision_is_active()returns False. Tools that gate behaviour on vision support — notablyFileEditorTool, which hides its image-view affordance — therefore never advertised image reading even when the wrapped ACP server forwards images to a vision-capable model (all three providers OpenHands supports do).This PR adds the missing abstraction and relies on #2881 (already merged) to do the real work:
AgentBase.supports_vision()— new method that delegates toself.llm.vision_is_active(). A clean hook for tools and for future agent subclasses.FileEditorTool.create— switches fromconv_state.agent.llm.vision_is_active()toconv_state.agent.supports_vision().No ACPAgent override is needed: #2881 already writes
acp_modelintoself.llm.modelinmodel_post_init(), so the base-class delegation lands on the real model name and LiteLLM resolves the capability correctly. A non-visionacp_modelcorrectly reports no-vision.Test plan
uv run pytest tests/sdk/agent/test_acp_agent.py tests/tools/file_editor/test_file_editor_tool.py— 169 passedTestACPAgentSupportsVisioncovers both a vision-capable and a non-visionacp_model;test_file_editor_tool_image_viewing_line_for_acp_agentcovers the end-to-end path.Part of #2471.
🤖 Generated with Claude Code
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:992a671-pythonRun
All tags pushed for this build
About Multi-Architecture Support
992a671-python) is a multi-arch manifest supporting both amd64 and arm64992a671-python-amd64) are also available if needed