fix(tools): merge subagents metrics (DelegateTool) by VascoSch92 · Pull Request #2221 · OpenHands/software-agent-sdk

VascoSch92 · 2026-02-26T14:29:51Z

Summary

There was a small problem with the token count for sub-agent spawning. In fact, all sub-agents were sharing the same metrics object, which was being reset by the LLMRegistry and not merged into the parent agent's Metrics object.

The fix consists of two parts:

We now reset the Metrics object during LLM creation, ensuring every LLM has its own unique Metrics object.
At the end of the process, we merge these individual metrics back into the parent agent's Metrics object.

Experiment

I used the following script

"""Demo: per-agent metrics after delegation (issue #2180 fix)."""

import os
from pathlib import Path

from openhands.sdk import LLM, Agent, Conversation, Tool
from openhands.tools.delegate import DelegateTool, DelegationVisualizer


llm = LLM(
    model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
    api_key=os.getenv("LLM_API_KEY"),
    base_url=os.environ.get("LLM_BASE_URL", None),
    usage_id="agent",
)

agent = Agent(llm=llm, tools=[Tool(name=DelegateTool.name)])
conversation = Conversation(
    agent=agent,
    workspace=Path.cwd(),
    visualizer=DelegationVisualizer(name="Main"),
)

conversation.send_message(
    "Spawn two sub-agents: 'lodging' and 'activities'. "
    "Ask lodging to list the top 3 London neighborhoods to stay in (keep it short and do NOT use internet access). "
    "Ask activities to list the top 3 must-see London attractions (keep it short and do NOT use internet access). "
    "Then write a one-paragraph summary combining both. "
    "Do NOT use internet access."
)
conversation.run()

# --- Print per-agent metrics breakdown ---
stats = conversation.conversation_stats
print("\n" + "=" * 50)
print("PER-AGENT METRICS BREAKDOWN")
print("=" * 50)
for usage_id, metrics in stats.usage_to_metrics.items():
    tokens = metrics.accumulated_token_usage
    print(f"\n  [{usage_id}]")
    print(f"    cost:              ${metrics.accumulated_cost:.4f}")
    print(f"    prompt tokens:     {tokens.prompt_tokens}")
    print(f"    completion tokens: {tokens.completion_tokens}")
    print(f"    LLM calls:         {len(metrics.token_usages)}")

combined = stats.get_combined_metrics()
tokens = combined.accumulated_token_usage
print("\n  [TOTAL]")
print(f"    cost:              ${combined.accumulated_cost:.4f}")
print(f"    prompt tokens:     {tokens.prompt_tokens}")
print(f"    completion tokens: {tokens.completion_tokens}")
print(f"    LLM calls:         {len(combined.token_usages)}")
print("=" * 50)

Output with the fix

==================================================
PER-AGENT METRICS BREAKDOWN
==================================================

  [agent]
    cost:              $0.0189
    prompt tokens:     15930
    completion tokens: 748
    LLM calls:         3

  [delegate:lodging]
    cost:              $0.0925
    prompt tokens:     22826
    completion tokens: 463
    LLM calls:         2

  [delegate:activities]
    cost:              $0.0925
    prompt tokens:     22826
    completion tokens: 463
    LLM calls:         2

  [TOTAL]
    cost:              $0.2039
    prompt tokens:     61582
    completion tokens: 1674
    LLM calls:         7
==================================================

output without the fix

==================================================
PER-AGENT METRICS BREAKDOWN
==================================================

  [agent]
    cost:              $0.0420
    prompt tokens:     38824
    completion tokens: 1573
    LLM calls:         5

  [TOTAL]
    cost:              $0.0420
    prompt tokens:     38824
    completion tokens: 1573
    LLM calls:         5
==================================================

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:bf8b2fb-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-bf8b2fb-python \
  ghcr.io/openhands/agent-server:bf8b2fb-python

All tags pushed for this build

ghcr.io/openhands/agent-server:bf8b2fb-golang-amd64
ghcr.io/openhands/agent-server:bf8b2fb-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:bf8b2fb-golang-arm64
ghcr.io/openhands/agent-server:bf8b2fb-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:bf8b2fb-java-amd64
ghcr.io/openhands/agent-server:bf8b2fb-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:bf8b2fb-java-arm64
ghcr.io/openhands/agent-server:bf8b2fb-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:bf8b2fb-python-amd64
ghcr.io/openhands/agent-server:bf8b2fb-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:bf8b2fb-python-arm64
ghcr.io/openhands/agent-server:bf8b2fb-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:bf8b2fb-golang
ghcr.io/openhands/agent-server:bf8b2fb-java
ghcr.io/openhands/agent-server:bf8b2fb-python

About Multi-Architecture Support

Each variant tag (e.g., bf8b2fb-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., bf8b2fb-python-amd64) are also available if needed

- Simplify builtin agent tests (no mocks, focus on registry contracts) - Extend delegation example to demonstrate built-in explore/bash agents Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

Found a critical bug and a testing gap that need attention before merge.

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

all-hands-bot

🟢 Good Taste - Clean Fix

Verdict: ✅ Worth merging

This is how bug fixes should be done:

Root cause identified: Pydantic shallow-copies private attrs, so all sub-agents shared the same _metrics reference
Minimal fix applied: Move LLM copy inside loop + call reset_metrics() to break the shared reference
Proper integration: Merge sub-agent metrics back into parent stats after delegation
Solid test coverage: 3 tests verify independent metrics, proper merging, and no double-counting

No complexity, no special cases, no breaking changes. All previous review concerns resolved.

Key Insight: Understanding data ownership (shallow vs deep copy) led to the right two-line fix rather than working around symptoms.

all-hands-bot

🟢 Clean Fix - LGTM

Taste Rating: 🟢 Good taste - Straightforward bug fix with proper testing.

What I Like:

Root cause correctly identified: Pydantic shallow-copies private attrs, causing shared Metrics references
Minimal fix: Move LLM copy inside loop + reset_metrics() to break the shared reference
Proper integration: Sub-agent metrics merged back into parent stats after delegation
Solid tests: Verify independent metrics, proper merging, and no double-counting. Tests use real objects, not just mocks.

Verdict: ✅ Ready to merge. This solves a real problem (incorrect token counts for sub-agents) with a proportional solution and comprehensive test coverage.

github-actions · 2026-02-26T15:15:28Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-tools/openhands/tools/delegate
impl.py	126	107	15%	29, 31–32, 41–42, 46, 52–53, 56–59, 61, 73–74, 78–80, 84–85, 92–94, 103–104, 114–118, 120, 124, 128–129, 131–132, 137–139, 141, 147, 150, 155, 160, 164, 169–171, 188–189, 196–198, 207, 209–211, 214–219, 221, 228–229, 231–232, 235–238, 240–241, 245–248, 251–253, 258–259, 262–263, 268–272, 277, 279–283, 285, 288–290, 292–293, 296, 298, 303–305
TOTAL	19109	9682	49%

This reverts commit 89d1c8d.

This reverts commit a9d8e83.

This reverts commit ac534fd.

This reverts commit d833cce.

This reverts commit 7037d55.

github-actions · 2026-02-27T12:23:38Z

API breakage checks (Griffe)

Result: Passed

Action log

github-actions · 2026-02-27T12:23:49Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

VascoSch92 and others added 8 commits February 25, 2026 17:08

add built-in agents

7037d55

add model key

d833cce

update

ac534fd

test: streamline builtin agent coverage

a9d8e83

- Simplify builtin agent tests (no mocks, focus on registry contracts) - Extend delegation example to demonstrate built-in explore/bash agents Co-authored-by: openhands <openhands@all-hands.dev>

Merge branch 'main' into vasco/issue-2051

9d18814

fix example and add logging

89d1c8d

merge subagents metrics

1ab6e39

update comment

6144e09

VascoSch92 requested a review from all-hands-bot February 26, 2026 14:30

all-hands-bot reviewed Feb 26, 2026

View reviewed changes

Comment thread openhands-tools/openhands/tools/delegate/impl.py Outdated

Comment thread openhands-tools/openhands/tools/delegate/impl.py Outdated

Comment thread openhands-tools/openhands/tools/delegate/impl.py Outdated

fix afer feedback

8eb6caa

VascoSch92 requested a review from all-hands-bot February 26, 2026 14:46

This comment was marked as outdated.

Sign in to view

VascoSch92 and others added 2 commits February 26, 2026 16:00

Update openhands-tools/openhands/tools/delegate/impl.py

6ab75a0

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

fix after feedback

43d008a

VascoSch92 requested a review from all-hands-bot February 26, 2026 15:08

all-hands-bot approved these changes Feb 26, 2026

View reviewed changes

VascoSch92 marked this pull request as ready for review February 26, 2026 15:10

all-hands-bot approved these changes Feb 26, 2026

View reviewed changes

VascoSch92 changed the title ~~fix(tools): merge subagents metrics~~ fix(tools): merge subagents metrics (DelegateTool) Feb 26, 2026

VascoSch92 mentioned this pull request Feb 26, 2026

fix(tools): merge subagents metrics (TaskToolSet) #2222

Merged

5 tasks

VascoSch92 changed the base branch from vasco/issue-2051 to main February 27, 2026 12:16

VascoSch92 and others added 6 commits February 27, 2026 13:20

Revert "fix example and add logging"

e51e3a6

This reverts commit 89d1c8d.

Revert "test: streamline builtin agent coverage"

21168d8

This reverts commit a9d8e83.

Revert "update"

f260e8a

This reverts commit ac534fd.

Revert "add model key"

da65626

This reverts commit d833cce.

Revert "add built-in agents"

017c94d

This reverts commit 7037d55.

Merge branch 'main' into vasco/issue-2180

41349e6

fix after rebasing

30a33ee

VascoSch92 enabled auto-merge (squash) February 27, 2026 12:27

VascoSch92 merged commit a691cda into main Feb 27, 2026
21 checks passed

VascoSch92 deleted the vasco/issue-2180 branch February 27, 2026 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tools): merge subagents metrics (DelegateTool)#2221

fix(tools): merge subagents metrics (DelegateTool)#2221
VascoSch92 merged 18 commits intomainfrom
vasco/issue-2180

VascoSch92 commented Feb 26, 2026 •

edited by github-actions bot

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot left a comment

Uh oh!

github-actions bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

VascoSch92 commented Feb 26, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Experiment

Checklist

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟢 Good Taste - Clean Fix

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟢 Clean Fix - LGTM

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VascoSch92 commented Feb 26, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading