Skip to content

fix(acp): show the real model in logs and serialized state#2881

Merged
simonrosenberg merged 1 commit intomainfrom
fix/acp-sentinel-show-real-model
Apr 18, 2026
Merged

fix(acp): show the real model in logs and serialized state#2881
simonrosenberg merged 1 commit intomainfrom
fix/acp-sentinel-show-real-model

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented Apr 18, 2026

Summary

Split the ACP sentinel LLM's two concerns. Today LLM(model="acp-managed") is both the placeholder shown everywhere AND the signal that tells title_utils "don't call this — it has no credentials." That means logs, serialized agent state, and any tool that reads agent.llm.model sees "acp-managed" even when a real acp_model is configured.

This PR:

  • Stamps the sentinel marker on llm.usage_id = "acp-managed" instead — a field that is internal, never shown to the ACP server, and independent of the model name.
  • Overwrites llm.model with self.acp_model in ACPAgent.model_post_init() when acp_model is set, so the rest of the system sees the actual model in logs and state dumps.
  • Switches title_utils to detect the sentinel via usage_id instead of model.

Zero behavior change when acp_model isn't set — llm.model stays "acp-managed".

Test plan

  • uv run pytest tests/sdk/agent/test_acp_agent.py — 158 passed
  • uv run pytest tests/agent_server/test_conversation_service.py::TestAutoTitle::test_autotitle_falls_back_for_acp_managed_llm — passes with the new usage_id-based signal
  • Added test_acp_model_propagated_to_llm_model and test_sentinel_usage_id_without_acp_model to lock in the contract.

Part of #2471.

🤖 Generated with Claude Code


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:55faae1-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-55faae1-python \
  ghcr.io/openhands/agent-server:55faae1-python

All tags pushed for this build

ghcr.io/openhands/agent-server:55faae1-golang-amd64
ghcr.io/openhands/agent-server:55faae1-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:55faae1-golang-arm64
ghcr.io/openhands/agent-server:55faae1-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:55faae1-java-amd64
ghcr.io/openhands/agent-server:55faae1-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:55faae1-java-arm64
ghcr.io/openhands/agent-server:55faae1-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:55faae1-python-amd64
ghcr.io/openhands/agent-server:55faae1-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:55faae1-python-arm64
ghcr.io/openhands/agent-server:55faae1-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:55faae1-golang
ghcr.io/openhands/agent-server:55faae1-java
ghcr.io/openhands/agent-server:55faae1-python

About Multi-Architecture Support

  • Each variant tag (e.g., 55faae1-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 55faae1-python-amd64) are also available if needed

The ACPAgent sentinel LLM used model="acp-managed" as both the
placeholder name and the signal that told title_utils "don't call
this — it has no credentials". That conflated two concerns and meant
logs, serialized agent state, and any tooling that reads
agent.llm.model saw "acp-managed" even when a real acp_model was
configured.

Split them:

- The sentinel marker now lives on llm.usage_id="acp-managed", which
  is independent of the model name and never shown to the ACP server.
- ACPAgent.model_post_init() now overwrites llm.model with acp_model
  (when provided), so the rest of the system sees the actual model.
- title_utils detects the sentinel via usage_id instead of model.

Part of #2471.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@simonrosenberg simonrosenberg self-assigned this Apr 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   acp_agent.py5514092%294–296, 426–427, 460, 462, 466, 470, 478, 536–537, 542, 609, 750, 753–754, 771–772, 779, 784, 815, 825, 830, 841–844, 953–956, 960–964, 966, 1450–1451
openhands-sdk/openhands/sdk/conversation
   title_utils.py56296%35, 75
TOTAL23488568375% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean separation of concerns.

KEY INSIGHT: Moving the sentinel marker from model (user-visible) to usage_id (internal) elegantly solves the dual requirement: downstream code can still detect "don't call this LLM" while logs/state show the actual model name. This makes debugging significantly easier without changing any agent behavior.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW

No changes to agent decision logic, prompts, or tool execution. Only affects which field holds the sentinel marker for title generation fallback. Well-tested with clear backward compatibility path.

VERDICT: ✅ Worth merging - Solves a real usability issue (confusing "acp-managed" in logs) with a simple, principled design.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Functionally verified that ACP agents now display the real model name in logs and serialized state while preserving sentinel detection via usage_id.

Does this PR achieve its stated goal?

Yes. The PR successfully splits the ACP sentinel LLM's two concerns. Previously, model="acp-managed" served dual purposes: (1) the placeholder shown everywhere and (2) the signal to title_utils not to call the LLM. Now the sentinel marker lives on usage_id="acp-managed" while llm.model shows the real model name when acp_model is configured. Verified via before/after testing, serialization round-trips, and title generation fallback behavior.

Phase Result
Environment Setup ✅ uv environment ready
CI & Tests ✅ 17/27 checks passing, 146/146 ACP tests pass
Functional Verification ✅ All behaviors verified with before/after comparison
Functional Verification

Test 1: Before/After Comparison — Model Display

Step 1 — Baseline on main branch (before fix):

Checked out main and created an ACPAgent with acp_model="gpt-4o":

agent = ACPAgent(
    acp_command=["echo", "test"],
    acp_model="gpt-4o",
    workspace=LocalWorkspace(working_dir="/tmp/test"),
    tools=[]
)

Output:

agent.acp_model: gpt-4o
agent.llm.model: acp-managed  # ← Shows sentinel, not the real model
agent.llm.usage_id: default

This confirms the bug: even though acp_model is set to "gpt-4o", the llm.model field still shows the sentinel "acp-managed". Logs and serialized state would show the wrong model.

Step 2 — Apply PR changes:

Checked out PR branch fix/acp-sentinel-show-real-model at commit 55faae19.

Step 3 — Re-run with the fix:

Created the same ACPAgent with acp_model="gpt-4o":

Output:

agent.acp_model: gpt-4o
agent.llm.model: gpt-4o        # ← Now shows the real model ✓
agent.llm.usage_id: acp-managed # ← Sentinel moved here ✓
agent.llm.metrics.model_name: gpt-4o

This confirms the fix works: llm.model now shows the real model name, while the sentinel marker lives on usage_id.


Test 2: Without acp_model — Sentinel Preserved

Created an ACPAgent without specifying acp_model:

agent = ACPAgent(
    acp_command=["echo", "test"],
    workspace=LocalWorkspace(working_dir="/tmp/test"),
    tools=[]
)

Output:

agent.acp_model: None
agent.llm.model: acp-managed     # ← Sentinel
agent.llm.usage_id: acp-managed  # ← Sentinel

✓ PASS: When no acp_model is configured, both fields show the sentinel as expected (zero behavior change).


Test 3: Serialization Round-Trip

Serialized and deserialized an ACPAgent with acp_model="gpt-4o":

serialized = agent.model_dump()
restored = ACPAgent.model_validate(serialized)

Results:

Serialized llm.model: gpt-4o
Serialized llm.usage_id: acp-managed
Restored llm.model: gpt-4o
Restored llm.usage_id: acp-managed

✓ PASS: Both fields are correctly preserved through serialization/deserialization. Persisted agent state will show the real model.


Test 4: title_utils Sentinel Detection

Verified that title_utils.generate_title_from_message() still correctly detects the ACP sentinel via usage_id (not model):

from openhands.sdk.conversation.title_utils import generate_title_from_message

agent = ACPAgent(
    acp_command=["echo", "test"],
    acp_model="gpt-4o",
    workspace=LocalWorkspace(working_dir="/tmp/test"),
    tools=[]
)

title = generate_title_from_message(
    "Fix the authentication bug in the login system that causes crashes",
    llm=agent.llm,
    max_length=50
)

Output:

Agent configuration:
  llm.model: gpt-4o
  llm.usage_id: acp-managed

Generated title: 'Fix the authentication bug in the login system ...'
Title length: 50

✓ PASS: The title was truncated to 50 characters, confirming that title_utils detected the sentinel via usage_id and fell back to truncation (did not attempt to call the LLM) even though llm.model now shows "gpt-4o".


Test 5: New Unit Tests

Ran the two new tests added by this PR:

uv run pytest tests/sdk/agent/test_acp_agent.py \
  -k "test_acp_model_propagated_to_llm_model or test_sentinel_usage_id_without_acp_model" \
  -v

Output:

tests/sdk/agent/test_acp_agent.py::TestACPAgentInstantiation::test_acp_model_propagated_to_llm_model PASSED
tests/sdk/agent/test_acp_agent.py::TestACPAgentInstantiation::test_sentinel_usage_id_without_acp_model PASSED
2 passed

✓ PASS: Both tests verify the new contract (real model in llm.model, sentinel in usage_id).


Test 6: Full Test Suite

Ran the complete ACP agent test suite:

uv run pytest tests/sdk/agent/test_acp_agent.py

Result: 146 passed in 6.73s

✓ PASS: All existing tests continue to pass with the new behavior.


Test 7: Autotitle Fallback Test

Ran the autotitle test that verifies ACP agents fall back to truncation:

uv run pytest tests/agent_server/test_conversation_service.py::TestAutoTitle::test_autotitle_falls_back_for_acp_managed_llm -v

Result: PASSED in 0.20s

✓ PASS: The test was correctly updated to check usage_id instead of model, and the fallback behavior works as expected.

Issues Found

None.

@simonrosenberg simonrosenberg merged commit 0937073 into main Apr 18, 2026
36 of 38 checks passed
@simonrosenberg simonrosenberg deleted the fix/acp-sentinel-show-real-model branch April 18, 2026 21:19
simonrosenberg pushed a commit that referenced this pull request Apr 18, 2026
Now that ACPAgent.model_post_init() writes acp_model into llm.model
(merged in #2881), llm.vision_is_active() gives an authoritative
answer for explicitly-configured models via LiteLLM. Use it when the
user has set acp_model — even if they pick a non-vision variant,
that choice wins over the server-identity default.

The provider-identity fallback only kicks in when acp_model is unset
and we can't know which default model the server will pick; for all
three providers we support, that default is vision-capable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants