Skip to content

fix(tools): merge subagents metrics (DelegateTool)#2221

Merged
VascoSch92 merged 18 commits intomainfrom
vasco/issue-2180
Feb 27, 2026
Merged

fix(tools): merge subagents metrics (DelegateTool)#2221
VascoSch92 merged 18 commits intomainfrom
vasco/issue-2180

Conversation

@VascoSch92
Copy link
Copy Markdown
Contributor

@VascoSch92 VascoSch92 commented Feb 26, 2026

Summary

(ref #2180)

There was a small problem with the token count for sub-agent spawning. In fact, all sub-agents were sharing the same metrics object, which was being reset by the LLMRegistry and not merged into the parent agent's Metrics object.

The fix consists of two parts:

  • We now reset the Metrics object during LLM creation, ensuring every LLM has its own unique Metrics object.
  • At the end of the process, we merge these individual metrics back into the parent agent's Metrics object.

Experiment

I used the following script

"""Demo: per-agent metrics after delegation (issue #2180 fix)."""

import os
from pathlib import Path

from openhands.sdk import LLM, Agent, Conversation, Tool
from openhands.tools.delegate import DelegateTool, DelegationVisualizer


llm = LLM(
    model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
    api_key=os.getenv("LLM_API_KEY"),
    base_url=os.environ.get("LLM_BASE_URL", None),
    usage_id="agent",
)

agent = Agent(llm=llm, tools=[Tool(name=DelegateTool.name)])
conversation = Conversation(
    agent=agent,
    workspace=Path.cwd(),
    visualizer=DelegationVisualizer(name="Main"),
)

conversation.send_message(
    "Spawn two sub-agents: 'lodging' and 'activities'. "
    "Ask lodging to list the top 3 London neighborhoods to stay in (keep it short and do NOT use internet access). "
    "Ask activities to list the top 3 must-see London attractions (keep it short and do NOT use internet access). "
    "Then write a one-paragraph summary combining both. "
    "Do NOT use internet access."
)
conversation.run()

# --- Print per-agent metrics breakdown ---
stats = conversation.conversation_stats
print("\n" + "=" * 50)
print("PER-AGENT METRICS BREAKDOWN")
print("=" * 50)
for usage_id, metrics in stats.usage_to_metrics.items():
    tokens = metrics.accumulated_token_usage
    print(f"\n  [{usage_id}]")
    print(f"    cost:              ${metrics.accumulated_cost:.4f}")
    print(f"    prompt tokens:     {tokens.prompt_tokens}")
    print(f"    completion tokens: {tokens.completion_tokens}")
    print(f"    LLM calls:         {len(metrics.token_usages)}")

combined = stats.get_combined_metrics()
tokens = combined.accumulated_token_usage
print("\n  [TOTAL]")
print(f"    cost:              ${combined.accumulated_cost:.4f}")
print(f"    prompt tokens:     {tokens.prompt_tokens}")
print(f"    completion tokens: {tokens.completion_tokens}")
print(f"    LLM calls:         {len(combined.token_usages)}")
print("=" * 50)

Output with the fix

==================================================
PER-AGENT METRICS BREAKDOWN
==================================================

  [agent]
    cost:              $0.0189
    prompt tokens:     15930
    completion tokens: 748
    LLM calls:         3

  [delegate:lodging]
    cost:              $0.0925
    prompt tokens:     22826
    completion tokens: 463
    LLM calls:         2

  [delegate:activities]
    cost:              $0.0925
    prompt tokens:     22826
    completion tokens: 463
    LLM calls:         2

  [TOTAL]
    cost:              $0.2039
    prompt tokens:     61582
    completion tokens: 1674
    LLM calls:         7
==================================================

output without the fix

==================================================
PER-AGENT METRICS BREAKDOWN
==================================================

  [agent]
    cost:              $0.0420
    prompt tokens:     38824
    completion tokens: 1573
    LLM calls:         5

  [TOTAL]
    cost:              $0.0420
    prompt tokens:     38824
    completion tokens: 1573
    LLM calls:         5
==================================================

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:bf8b2fb-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-bf8b2fb-python \
  ghcr.io/openhands/agent-server:bf8b2fb-python

All tags pushed for this build

ghcr.io/openhands/agent-server:bf8b2fb-golang-amd64
ghcr.io/openhands/agent-server:bf8b2fb-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:bf8b2fb-golang-arm64
ghcr.io/openhands/agent-server:bf8b2fb-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:bf8b2fb-java-amd64
ghcr.io/openhands/agent-server:bf8b2fb-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:bf8b2fb-java-arm64
ghcr.io/openhands/agent-server:bf8b2fb-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:bf8b2fb-python-amd64
ghcr.io/openhands/agent-server:bf8b2fb-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:bf8b2fb-python-arm64
ghcr.io/openhands/agent-server:bf8b2fb-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:bf8b2fb-golang
ghcr.io/openhands/agent-server:bf8b2fb-java
ghcr.io/openhands/agent-server:bf8b2fb-python

About Multi-Architecture Support

  • Each variant tag (e.g., bf8b2fb-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., bf8b2fb-python-amd64) are also available if needed

VascoSch92 and others added 8 commits February 25, 2026 17:08
- Simplify builtin agent tests (no mocks, focus on registry contracts)

- Extend delegation example to demonstrate built-in explore/bash agents

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a critical bug and a testing gap that need attention before merge.

Comment thread openhands-tools/openhands/tools/delegate/impl.py Outdated
Comment thread openhands-tools/openhands/tools/delegate/impl.py Outdated
Comment thread openhands-tools/openhands/tools/delegate/impl.py Outdated
all-hands-bot

This comment was marked as outdated.

VascoSch92 and others added 2 commits February 26, 2026 16:00
Co-authored-by: OpenHands Bot <contact@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good Taste - Clean Fix

Verdict: ✅ Worth merging

This is how bug fixes should be done:

  1. Root cause identified: Pydantic shallow-copies private attrs, so all sub-agents shared the same _metrics reference
  2. Minimal fix applied: Move LLM copy inside loop + call reset_metrics() to break the shared reference
  3. Proper integration: Merge sub-agent metrics back into parent stats after delegation
  4. Solid test coverage: 3 tests verify independent metrics, proper merging, and no double-counting

No complexity, no special cases, no breaking changes. All previous review concerns resolved.

Key Insight: Understanding data ownership (shallow vs deep copy) led to the right two-line fix rather than working around symptoms.

@VascoSch92 VascoSch92 marked this pull request as ready for review February 26, 2026 15:10
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Clean Fix - LGTM

Taste Rating: 🟢 Good taste - Straightforward bug fix with proper testing.

What I Like:

  • Root cause correctly identified: Pydantic shallow-copies private attrs, causing shared Metrics references
  • Minimal fix: Move LLM copy inside loop + reset_metrics() to break the shared reference
  • Proper integration: Sub-agent metrics merged back into parent stats after delegation
  • Solid tests: Verify independent metrics, proper merging, and no double-counting. Tests use real objects, not just mocks.

Verdict: ✅ Ready to merge. This solves a real problem (incorrect token counts for sub-agents) with a proportional solution and comprehensive test coverage.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 26, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-tools/openhands/tools/delegate
   impl.py12610715%29, 31–32, 41–42, 46, 52–53, 56–59, 61, 73–74, 78–80, 84–85, 92–94, 103–104, 114–118, 120, 124, 128–129, 131–132, 137–139, 141, 147, 150, 155, 160, 164, 169–171, 188–189, 196–198, 207, 209–211, 214–219, 221, 228–229, 231–232, 235–238, 240–241, 245–248, 251–253, 258–259, 262–263, 268–272, 277, 279–283, 285, 288–290, 292–293, 296, 298, 303–305
TOTAL19109968249% 

@VascoSch92 VascoSch92 changed the title fix(tools): merge subagents metrics fix(tools): merge subagents metrics (DelegateTool) Feb 26, 2026
@VascoSch92 VascoSch92 changed the base branch from vasco/issue-2051 to main February 27, 2026 12:16
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 27, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 27, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

@VascoSch92 VascoSch92 enabled auto-merge (squash) February 27, 2026 12:27
@VascoSch92 VascoSch92 merged commit a691cda into main Feb 27, 2026
21 checks passed
@VascoSch92 VascoSch92 deleted the vasco/issue-2180 branch February 27, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants