fix(tool): race condition in dynamic Action wrapper class creation by VascoSch92 · Pull Request #2224 · OpenHands/software-agent-sdk

VascoSch92 · 2026-02-26T15:58:33Z

Summary

Fix #2199

Add threading.Lock to create_action_type_with_risk() and _create_action_type_with_summary() to prevent concurrent threads from creating duplicate wrapper classes with the same name
Without the lock, two subagent threads could both see a cache miss, both call type(), and register two distinct class objects under the same __name__ , causing _get_checked_concrete_subclasses(Action) to raise ValueError("Duplicate class definition ...")

Tests

Added concurrent threading tests in tests/sdk/tool/test_tool.py that spawn 8 threads hitting the same wrapper function simultaneously

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4bc57c3-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4bc57c3-python \
  ghcr.io/openhands/agent-server:4bc57c3-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4bc57c3-golang-amd64
ghcr.io/openhands/agent-server:4bc57c3-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4bc57c3-golang-arm64
ghcr.io/openhands/agent-server:4bc57c3-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4bc57c3-java-amd64
ghcr.io/openhands/agent-server:4bc57c3-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4bc57c3-java-arm64
ghcr.io/openhands/agent-server:4bc57c3-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4bc57c3-python-amd64
ghcr.io/openhands/agent-server:4bc57c3-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:4bc57c3-python-arm64
ghcr.io/openhands/agent-server:4bc57c3-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:4bc57c3-golang
ghcr.io/openhands/agent-server:4bc57c3-java
ghcr.io/openhands/agent-server:4bc57c3-python

About Multi-Architecture Support

Each variant tag (e.g., 4bc57c3-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 4bc57c3-python-amd64) are also available if needed

- Simplify builtin agent tests (no mocks, focus on registry contracts) - Extend delegation example to demonstrate built-in explore/bash agents Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

all-hands-bot

Solid fix for the TOCTOU race in dynamic action type creation. The lock usage is textbook correct and the tests properly reproduce the race condition.

Minor: PR title appears incomplete ("fix(tool): but") and description has placeholder text - might want to update those before merging.

github-actions · 2026-02-26T16:01:50Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/tool
tool.py	150	3	98%	225, 317, 323
TOTAL	19000	5681	70%

VascoSch92 · 2026-02-26T16:03:46Z

@OpenHands Can you double-check that the tests is really reproducing the bug and that the fix is good?

openhands-ai · 2026-02-26T16:04:18Z

I'm on it! VascoSch92 can track my progress at all-hands.dev

openhands-ai · 2026-02-26T16:08:39Z

Summary

I have completed a comprehensive verification of the race condition fix in PR #2224. Here's what I found:

✅ Bug Reproduction Verified

The tests in test_duplicate_action_class_bug.py properly reproduce the TOCTOU race condition described in issue bug(sdk): Subagent spawn fails with Pydantic ValidationError: Duplicate class definition for BrowserCloseTabActionWithRiskWithSummary #2199
My manual reproduction confirmed that duplicate class definitions can occur without the fix, triggering ValueError("Duplicate class definition...")

✅ Fix Validation Complete

The fix correctly adds threading.Lock protection around the module-level caches in both create_action_type_with_risk and _create_action_type_with_summary
All tests pass: 2 new race condition tests + 107 broader tool tests
The fix is minimal, thread-safe, and maintains backward compatibility

✅ No Additional Changes Needed

The existing changes are concise and directly address the race condition
No extraneous modifications were found
The implementation follows good engineering practices for concurrent access

The PR is ready as-is - the tests effectively reproduce the bug and the fix properly resolves the race condition without introducing regressions.

all-hands-bot

🟢 Good taste - Textbook TOCTOU race fix. Lock scope is correct, tests use real concurrency with Barrier to force the race, and the solution is as simple as it gets. This is exactly how you fix a race condition: protect the entire check-create-store sequence with a single lock. No deadlock risk, minimal performance impact, solves the real problem. LGTM.

enyst · 2026-02-26T23:25:14Z

@OpenHands Do a /codereview-roasted on this PR.

openhands-ai · 2026-02-26T23:25:40Z

I'm on it! enyst can track my progress at all-hands.dev

enyst

Thank you!

enyst · 2026-02-27T00:49:34Z

(OpenHands-GPT-5.2)

🟡 Taste rating: Acceptable (but messy)

This PR does fix a real race, but it’s also doing a bunch of unrelated stuff (built-in subagents, delegation metrics semantics, example churn). That’s not “good taste”: it makes review harder and increases the chance you ship an unrelated regression under the cover of a concurrency fix.

[CRITICAL ISSUES] (things that can bite real users)

[openhands-sdk/openhands/sdk/subagent/registry.py, Lines 150–156] Tool “validation” that doesn’t validate
- You check tool_name not in registered_tools, log an info line, and then still build Tool(name=tool_name) anyway.
- This is the worst of both worlds:
  - It doesn’t prevent typos (so the real failure still happens later, further from the source).
  - It spams logs in any flow where tools aren’t registered yet (which your own message explicitly anticipates).
- Fix suggestion: either (a) fail fast with a clear error (best for correctness), or (b) remove this check entirely. If you must keep it, at least make it debug and include the agent name so it’s actionable.
[openhands-sdk/openhands/sdk/subagent/builtins/explore.md, Lines 9–31] “Read-only” agent with a write-capable tool
- You label explore as “read-only” but give it the raw terminal tool. That’s not a constraint, it’s a vibe.
- If the intent is “policy-only”, fine—but then don’t market it as read-only in a way that implies enforcement.
- Fix suggestion: reword to “read-only by instruction/policy” (or introduce a genuinely constrained toolset).

[IMPROVEMENT OPPORTUNITIES] (good taste / simplification)

[openhands-sdk/openhands/sdk/tool/tool.py, Lines 44–46, 481–536] Global lock is OK, but it’s coarse and manual
- Yes, the lock fixes the TOCTOU cache miss. But you now serialize all wrapper creation (risk + summary) behind one global lock.
- Also: you’re hand-rolling caching+locking when Python already has battle-tested options.
- Fix suggestion: consider replacing both dicts with @functools.lru_cache(maxsize=None) / @functools.cache on the wrapper constructors. Those use internal locking and make the code smaller and harder to get wrong. If you keep the lock, consider RLock to avoid future deadlocks if anyone ever nests these calls.
[openhands-tools/openhands/tools/delegate/impl.py, Lines 221–248] Cross-thread dict writes rely on the GIL
- results[...] = ... and errors[...] = ... happen from multiple threads with no lock.
- In CPython today this “usually works” because of the GIL, but it’s not a great pattern to normalize in an SDK, and it makes refactors riskier.
- Fix suggestion: either protect writes with a threading.Lock, or push results into a queue.Queue and merge after join.
[tests/sdk/subagent/test_builtin_agents.py, Lines 54–61] Brittle test asserting exact default tool list
- assert agent_tool_names["default"] == [...] will break the moment you add/remove a default tool for legitimate reasons.
- Fix suggestion: assert key invariants (e.g., {"terminal","file_editor","task_tracker"}.issubset(...)) unless you explicitly want the test suite to enforce that default tool list is frozen.

[TESTING GAPS / FLAKINESS RISKS]

[tests/sdk/tool/test_tool.py, Lines 199–268] Concurrency test may not actually reproduce the bug reliably
- Barrier + 8 threads helps, but this is still scheduler-dependent: on some machines/load, the “broken” implementation might pass, making this more of a “regression guard” than a reproduction.
- Also: the duplicated docstring is sloppy, and the “Ref:” URL looks bogus.
- Fix suggestion:
  - Loop the race attempt (e.g., 100–1000 iterations) to increase failure probability on the buggy version.
  - Clean up the double docstring and link.
  - Use a thread-safe collector (Queue) for results to avoid teaching “shared list append is fine” as an example.

VERDICT

✅ Worth merging for the race fix, but the PR is doing too much at once and includes a couple of “looks like validation but isn’t” design choices (tool registration logging) plus policy-vs-enforcement confusion (“read-only” explore with terminal).

KEY INSIGHT: The locking fix is fine; the bigger risk here is scope creep and introducing behavioral/logging changes that aren’t directly related to the race and are harder to reason about.

enyst · 2026-02-27T00:50:45Z

You label explore as “read-only” but give it the raw terminal tool. That’s not a constraint, it’s a vibe.

😅

VascoSch92 · 2026-02-27T08:20:36Z

@enyst Actually, I think there might be an issue with the review you requested. The reviewer is referring to changes that aren't present in this PR. For example, your comment above relates to the built-in agent explorer, but that agent wasn't added in this PR. I suspect the reviewer had access to the wrong diff.

This reverts commit 43d008a.

This reverts commit 6ab75a0.

This reverts commit 8eb6caa.

This reverts commit 6144e09.

This reverts commit 1ab6e39.

This reverts commit 89d1c8d.

github-actions · 2026-02-27T12:30:44Z

API breakage checks (Griffe)

Result: Passed

Action log

github-actions · 2026-02-27T12:31:01Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

VascoSch92 and others added 12 commits February 25, 2026 17:08

add built-in agents

7037d55

add model key

d833cce

update

ac534fd

test: streamline builtin agent coverage

a9d8e83

- Simplify builtin agent tests (no mocks, focus on registry contracts) - Extend delegation example to demonstrate built-in explore/bash agents Co-authored-by: openhands <openhands@all-hands.dev>

Merge branch 'main' into vasco/issue-2051

9d18814

fix example and add logging

89d1c8d

merge subagents metrics

1ab6e39

update comment

6144e09

fix afer feedback

8eb6caa

Update openhands-tools/openhands/tools/delegate/impl.py

6ab75a0

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

fix after feedback

43d008a

fix bug race

e497a9e

VascoSch92 requested a review from all-hands-bot February 26, 2026 15:58

all-hands-bot approved these changes Feb 26, 2026

View reviewed changes

rewrite tests better

dd298a3

VascoSch92 changed the title ~~fix(tool): but~~ fix(tool): race condition in dynamic Action wrapper class creation Feb 26, 2026

VascoSch92 marked this pull request as ready for review February 26, 2026 17:28

Merge branch 'vasco/issue-2180' into vasco/back-tool-registration

2573eaf

all-hands-bot approved these changes Feb 26, 2026

View reviewed changes

enyst approved these changes Feb 26, 2026

View reviewed changes

This comment was marked as duplicate.

Sign in to view

VascoSch92 changed the base branch from vasco/issue-2180 to main February 27, 2026 12:28

VascoSch92 added 6 commits February 27, 2026 13:30

Revert "fix after feedback"

e9686c6

This reverts commit 43d008a.

Revert "Update openhands-tools/openhands/tools/delegate/impl.py"

8242bf7

This reverts commit 6ab75a0.

Revert "fix afer feedback"

954c4d4

This reverts commit 8eb6caa.

Revert "update comment"

f79da23

This reverts commit 6144e09.

Revert "merge subagents metrics"

df2f579

This reverts commit 1ab6e39.

Revert "fix example and add logging"

fd76df8

This reverts commit 89d1c8d.

VascoSch92 and others added 3 commits February 27, 2026 13:33

revert

4f9cbfa

revert

cf3487f

Merge branch 'main' into vasco/back-tool-registration

bf2424c

VascoSch92 merged commit e24df60 into main Feb 27, 2026
26 checks passed

VascoSch92 deleted the vasco/back-tool-registration branch February 27, 2026 12:44

enyst mentioned this pull request Feb 27, 2026

Document all system prompt sections (sdk + OpenHands + OpenHands-CLI) #1965

Open

Conversation

VascoSch92 commented Feb 26, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Checklist

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VascoSch92 commented Feb 26, 2026

Uh oh!

openhands-ai bot commented Feb 26, 2026

Uh oh!

openhands-ai bot commented Feb 26, 2026

Summary

✅ Bug Reproduction Verified

✅ Fix Validation Complete

✅ No Additional Changes Needed

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

enyst commented Feb 26, 2026

Uh oh!

openhands-ai bot commented Feb 26, 2026

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

enyst commented Feb 27, 2026

[CRITICAL ISSUES] (things that can bite real users)

[IMPROVEMENT OPPORTUNITIES] (good taste / simplification)

[TESTING GAPS / FLAKINESS RISKS]

VERDICT

Uh oh!

enyst commented Feb 27, 2026

Uh oh!

VascoSch92 commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VascoSch92 commented Feb 26, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading