Skip to content

fix(tool): race condition in dynamic Action wrapper class creation#2224

Merged
VascoSch92 merged 23 commits intomainfrom
vasco/back-tool-registration
Feb 27, 2026
Merged

fix(tool): race condition in dynamic Action wrapper class creation#2224
VascoSch92 merged 23 commits intomainfrom
vasco/back-tool-registration

Conversation

@VascoSch92
Copy link
Copy Markdown
Contributor

@VascoSch92 VascoSch92 commented Feb 26, 2026

Summary

Fix #2199

  • Add threading.Lock to create_action_type_with_risk() and _create_action_type_with_summary() to prevent concurrent threads from creating duplicate wrapper classes with the same name
  • Without the lock, two subagent threads could both see a cache miss, both call type(), and register two distinct class objects under the same __name__ , causing _get_checked_concrete_subclasses(Action) to raise ValueError("Duplicate class definition ...")

Tests

  • Added concurrent threading tests in tests/sdk/tool/test_tool.py that spawn 8 threads hitting the same wrapper function simultaneously

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4bc57c3-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4bc57c3-python \
  ghcr.io/openhands/agent-server:4bc57c3-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4bc57c3-golang-amd64
ghcr.io/openhands/agent-server:4bc57c3-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4bc57c3-golang-arm64
ghcr.io/openhands/agent-server:4bc57c3-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4bc57c3-java-amd64
ghcr.io/openhands/agent-server:4bc57c3-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4bc57c3-java-arm64
ghcr.io/openhands/agent-server:4bc57c3-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4bc57c3-python-amd64
ghcr.io/openhands/agent-server:4bc57c3-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:4bc57c3-python-arm64
ghcr.io/openhands/agent-server:4bc57c3-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:4bc57c3-golang
ghcr.io/openhands/agent-server:4bc57c3-java
ghcr.io/openhands/agent-server:4bc57c3-python

About Multi-Architecture Support

  • Each variant tag (e.g., 4bc57c3-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 4bc57c3-python-amd64) are also available if needed

VascoSch92 and others added 12 commits February 25, 2026 17:08
- Simplify builtin agent tests (no mocks, focus on registry contracts)

- Extend delegation example to demonstrate built-in explore/bash agents

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: OpenHands Bot <contact@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix for the TOCTOU race in dynamic action type creation. The lock usage is textbook correct and the tests properly reproduce the race condition.

Minor: PR title appears incomplete ("fix(tool): but") and description has placeholder text - might want to update those before merging.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 26, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/tool
   tool.py150398%225, 317, 323
TOTAL19000568170% 

@VascoSch92
Copy link
Copy Markdown
Contributor Author

@OpenHands Can you double-check that the tests is really reproducing the bug and that the fix is good?

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 26, 2026

I'm on it! VascoSch92 can track my progress at all-hands.dev

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 26, 2026

Summary

I have completed a comprehensive verification of the race condition fix in PR #2224. Here's what I found:

Bug Reproduction Verified

Fix Validation Complete

  • The fix correctly adds threading.Lock protection around the module-level caches in both create_action_type_with_risk and _create_action_type_with_summary
  • All tests pass: 2 new race condition tests + 107 broader tool tests
  • The fix is minimal, thread-safe, and maintains backward compatibility

No Additional Changes Needed

  • The existing changes are concise and directly address the race condition
  • No extraneous modifications were found
  • The implementation follows good engineering practices for concurrent access

The PR is ready as-is - the tests effectively reproduce the bug and the fix properly resolves the race condition without introducing regressions.

@VascoSch92 VascoSch92 changed the title fix(tool): but fix(tool): race condition in dynamic Action wrapper class creation Feb 26, 2026
@VascoSch92 VascoSch92 marked this pull request as ready for review February 26, 2026 17:28
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Textbook TOCTOU race fix. Lock scope is correct, tests use real concurrency with Barrier to force the race, and the solution is as simple as it gets. This is exactly how you fix a race condition: protect the entire check-create-store sequence with a single lock. No deadlock risk, minimal performance impact, solves the real problem. LGTM.

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Feb 26, 2026

@OpenHands Do a /codereview-roasted on this PR.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 26, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@openhands-ai

This comment was marked as duplicate.

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Feb 27, 2026

(OpenHands-GPT-5.2)

🟡 Taste rating: Acceptable (but messy)

This PR does fix a real race, but it’s also doing a bunch of unrelated stuff (built-in subagents, delegation metrics semantics, example churn). That’s not “good taste”: it makes review harder and increases the chance you ship an unrelated regression under the cover of a concurrency fix.


[CRITICAL ISSUES] (things that can bite real users)

  • [openhands-sdk/openhands/sdk/subagent/registry.py, Lines 150–156] Tool “validation” that doesn’t validate

    • You check tool_name not in registered_tools, log an info line, and then still build Tool(name=tool_name) anyway.
    • This is the worst of both worlds:
      • It doesn’t prevent typos (so the real failure still happens later, further from the source).
      • It spams logs in any flow where tools aren’t registered yet (which your own message explicitly anticipates).
    • Fix suggestion: either (a) fail fast with a clear error (best for correctness), or (b) remove this check entirely. If you must keep it, at least make it debug and include the agent name so it’s actionable.
  • [openhands-sdk/openhands/sdk/subagent/builtins/explore.md, Lines 9–31] “Read-only” agent with a write-capable tool

    • You label explore as “read-only” but give it the raw terminal tool. That’s not a constraint, it’s a vibe.
    • If the intent is “policy-only”, fine—but then don’t market it as read-only in a way that implies enforcement.
    • Fix suggestion: reword to “read-only by instruction/policy” (or introduce a genuinely constrained toolset).

[IMPROVEMENT OPPORTUNITIES] (good taste / simplification)

  • [openhands-sdk/openhands/sdk/tool/tool.py, Lines 44–46, 481–536] Global lock is OK, but it’s coarse and manual

    • Yes, the lock fixes the TOCTOU cache miss. But you now serialize all wrapper creation (risk + summary) behind one global lock.
    • Also: you’re hand-rolling caching+locking when Python already has battle-tested options.
    • Fix suggestion: consider replacing both dicts with @functools.lru_cache(maxsize=None) / @functools.cache on the wrapper constructors. Those use internal locking and make the code smaller and harder to get wrong. If you keep the lock, consider RLock to avoid future deadlocks if anyone ever nests these calls.
  • [openhands-tools/openhands/tools/delegate/impl.py, Lines 221–248] Cross-thread dict writes rely on the GIL

    • results[...] = ... and errors[...] = ... happen from multiple threads with no lock.
    • In CPython today this “usually works” because of the GIL, but it’s not a great pattern to normalize in an SDK, and it makes refactors riskier.
    • Fix suggestion: either protect writes with a threading.Lock, or push results into a queue.Queue and merge after join.
  • [tests/sdk/subagent/test_builtin_agents.py, Lines 54–61] Brittle test asserting exact default tool list

    • assert agent_tool_names["default"] == [...] will break the moment you add/remove a default tool for legitimate reasons.
    • Fix suggestion: assert key invariants (e.g., {"terminal","file_editor","task_tracker"}.issubset(...)) unless you explicitly want the test suite to enforce that default tool list is frozen.

[TESTING GAPS / FLAKINESS RISKS]

  • [tests/sdk/tool/test_tool.py, Lines 199–268] Concurrency test may not actually reproduce the bug reliably
    • Barrier + 8 threads helps, but this is still scheduler-dependent: on some machines/load, the “broken” implementation might pass, making this more of a “regression guard” than a reproduction.
    • Also: the duplicated docstring is sloppy, and the “Ref:” URL looks bogus.
    • Fix suggestion:
      • Loop the race attempt (e.g., 100–1000 iterations) to increase failure probability on the buggy version.
      • Clean up the double docstring and link.
      • Use a thread-safe collector (Queue) for results to avoid teaching “shared list append is fine” as an example.

VERDICT

Worth merging for the race fix, but the PR is doing too much at once and includes a couple of “looks like validation but isn’t” design choices (tool registration logging) plus policy-vs-enforcement confusion (“read-only” explore with terminal).

KEY INSIGHT: The locking fix is fine; the bigger risk here is scope creep and introducing behavioral/logging changes that aren’t directly related to the race and are harder to reason about.

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Feb 27, 2026

  • You label explore as “read-only” but give it the raw terminal tool. That’s not a constraint, it’s a vibe.

😅

@VascoSch92
Copy link
Copy Markdown
Contributor Author

@enyst Actually, I think there might be an issue with the review you requested. The reviewer is referring to changes that aren't present in this PR. For example, your comment above relates to the built-in agent explorer, but that agent wasn't added in this PR. I suspect the reviewer had access to the wrong diff.

@VascoSch92 VascoSch92 changed the base branch from vasco/issue-2180 to main February 27, 2026 12:28
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 27, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 27, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

@VascoSch92 VascoSch92 merged commit e24df60 into main Feb 27, 2026
26 checks passed
@VascoSch92 VascoSch92 deleted the vasco/back-tool-registration branch February 27, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(sdk): Subagent spawn fails with Pydantic ValidationError: Duplicate class definition for BrowserCloseTabActionWithRiskWithSummary

4 participants