Skip to content

fix: synchronize ACP telemetry and refresh remote final state#2460

Merged
simonrosenberg merged 3 commits intomainfrom
fix/issue-2375-acp-telemetry
Mar 18, 2026
Merged

fix: synchronize ACP telemetry and refresh remote final state#2460
simonrosenberg merged 3 commits intomainfrom
fix/issue-2375-acp-telemetry

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented Mar 16, 2026

Fixes #2375

This implements the fix direction from the issue discussion:

  • move ACP telemetry writes onto a single synchronized path in ACPAgent
  • stop mutating metrics directly from session_update()
  • wait for the turn's UsageUpdate before recording cost/tokens/latency
  • refresh the authoritative remote conversation state before run() returns
  • keep event reconciliation for history completeness after the final state refresh

Why:

Latest zero-cost ACP benchmark rows were caused by two separate correctness problems:

  1. ACP telemetry was split across notification handling and prompt response handling.
  2. RemoteConversation could return from REST fallback with stale cached state, leaving conversation_stats at zero even when the server had final stats.

Tests:

  • PYTHONPATH=/tmp/sdk-issue-2375/openhands-sdk${PYTHONPATH:+:$PYTHONPATH} pytest tests/sdk/agent/test_acp_agent.py tests/sdk/conversation/remote/test_remote_conversation.py

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:544db68-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-544db68-python \
  ghcr.io/openhands/agent-server:544db68-python

All tags pushed for this build

ghcr.io/openhands/agent-server:544db68-golang-amd64
ghcr.io/openhands/agent-server:544db68-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:544db68-golang-arm64
ghcr.io/openhands/agent-server:544db68-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:544db68-java-amd64
ghcr.io/openhands/agent-server:544db68-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:544db68-java-arm64
ghcr.io/openhands/agent-server:544db68-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:544db68-python-amd64
ghcr.io/openhands/agent-server:544db68-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:544db68-python-arm64
ghcr.io/openhands/agent-server:544db68-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:544db68-golang
ghcr.io/openhands/agent-server:544db68-java
ghcr.io/openhands/agent-server:544db68-python

About Multi-Architecture Support

  • Each variant tag (e.g., 544db68-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 544db68-python-amd64) are also available if needed

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Core telemetry fix is solid, but bundled breaking changes need attention

Verdict: ❌ Needs documentation - The synchronization fix is correct, but two undocumented breaking changes (retry removal + hook regression) should be called out in the PR description or split into separate PRs.

Key Insight: Moving cost recording to a single synchronized path after UsageUpdate receipt is the right fix for zero-cost telemetry. The per-session tracking is cleaner than global state. However, removing ~150 lines of retry logic and changing hook behavior are significant changes that deserve explicit justification.

Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py Outdated
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional feedback on test coverage:

Comment thread tests/sdk/agent/test_acp_agent.py
@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

Addressed the remaining review items:

  • restored ACP prompt retry behavior for transient connection errors
  • reverted the unrelated RemoteConversation hook behavior change so hooks remain server-side
  • added a timeout-path ACP telemetry test to verify graceful degradation when UsageUpdate does not arrive in time

Verification:

  • PYTHONPATH=/tmp/sdk-issue-2375/openhands-sdk${PYTHONPATH:+:$PYTHONPATH} pytest tests/sdk/agent/test_acp_agent.py tests/sdk/conversation/remote/test_remote_conversation.py
  • result: 109 passed

@github-actions
Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   acp_agent.py4207482%183–185, 261–264, 266–267, 293, 295, 299, 305, 316–317, 322, 389, 503–504, 515, 520, 551, 561, 566, 577–580, 586–587, 590–592, 594, 596–597, 599, 601, 606, 615–616, 620–621, 625, 632–638, 648–653, 655, 664–666, 669–670, 676–680, 682, 684–685, 693, 1010–1011
openhands-sdk/openhands/sdk/conversation/impl
   remote_conversation.py61910083%77, 79, 150, 177, 190, 192–195, 205, 227–228, 233–236, 319, 329–331, 337, 378, 525–528, 530, 550–554, 559–562, 565, 581, 737–738, 742–743, 757, 781–782, 801, 812–813, 833–836, 838–839, 863–865, 868–872, 874–875, 879, 881–889, 891, 928, 1059, 1131–1132, 1136, 1141–1145, 1151–1157, 1170–1171, 1257, 1264, 1270–1271, 1349–1350
TOTAL20276587871% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟢 Good taste - Elegant fix that eliminates race conditions through proper synchronization

Verdict: ✅ Approved - Solid architecture, comprehensive tests, real problem solved

Key Insight: Moving cost recording to a single synchronized path after UsageUpdate receipt is exactly the right fix. The asyncio.Event-based synchronization is clean, the per-session tracking handles concurrent sessions correctly, and the final state refresh ensures accurate reporting. Well done.

@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

✅ Telemetry Fix Validated

Ran a SWE-bench evaluation (50 instances, ACP Claude Code agent) on this branch. Telemetry is now working correctly.

Validation Results

Check Result
Non-zero costs ✅ All 50 instances
Cost-token consistency ✅ Per-turn sums match totals
Pricing accuracy ✅ 0.0% deviation from expected
Response IDs ✅ All present
Cache tracking ✅ Working

Total cost tracked: $40.94 across 50 instances (mean $0.82/instance)

Methodology

  1. Triggered eval: swebench, eval_limit=50, agent_type=acp-claude, SDK branch fix/issue-2375-acp-telemetry
  2. Downloaded GCS artifacts and parsed output.jsonl
  3. Validated:
    • No zero-cost instances (pre-fix symptom)
    • Costs match Claude Sonnet 4.5 pricing formula exactly
    • Per-turn metrics sum to accumulated totals
    • All API calls have response IDs

Links

@simonrosenberg simonrosenberg merged commit 6b02df0 into main Mar 18, 2026
33 of 34 checks passed
@simonrosenberg simonrosenberg deleted the fix/issue-2375-acp-telemetry branch March 18, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ACP Cost Tracking Bug

2 participants