feat(sdk/agent): Parallel Tool Call Execution by VascoSch92 · Pull Request #2390 · OpenHands/software-agent-sdk

VascoSch92 · 2026-03-11T19:41:17Z

Summary

Add ParallelToolExecutor to enable concurrent tool execution within agent steps, controlled by the TOOL_CONCURRENCY_LIMIT environment variable (default: 1, fully backward-compatible).

Motivation

When an LLM returns multiple tool calls in a single response (e.g., "read these 3 files" or "run these 4 independent searches"), the current agent executes them sequentially. For I/O-bound tools — file reads, HTTP requests, MCP server calls, database queries — this leaves significant performance on the table. Parallel execution turns N × latency into ~1 × latency for independent operations.

Concrete scenarios where this helps:

Multi-file reads: Agent asks to read 5 source files to understand a codebase → 5 sequential disk/network reads become 1 parallel batch
Web search + fetch: Agent issues parallel web searches or API calls → wall-clock time drops from sum to max of individual latencies
MCP tool calls: Multiple independent MCP server requests (e.g., querying different data sources) execute concurrently
Subagent orchestration: Parent agent dispatches multiple independent tool calls while subagents can safely use their own parallel executors without deadlocking

What this does NOT help: CPU-bound tools limited by the GIL, or tools with shared mutable state that aren't thread-safe.

Design

Per-agent ThreadPoolExecutor: Each ParallelToolExecutor instance owns its own thread pool, so subagents never compete with their parent for pool slots (no deadlocks).
Opt-in via TOOL_CONCURRENCY_LIMIT: Default is 1 (sequential), preserving existing behavior. Set to N > 1 to enable parallelism.
Side-effect-free execution: _execute_action_event returns list[Event] instead of calling on_event directly, making it safe for concurrent threads. All state mutations (pop_blocked_action, execution_status = FINISHED, event
emission) happen on the main thread after parallel work completes.
FinishTool truncation: If a batch contains finish, all tool calls after it are discarded and logged — they never execute.
Error isolation: All exceptions are caught and wrapped in AgentErrorEvent, so one failing tool never crashes the agent or prevents sibling tools from completing. ValueError (expected tool errors) is logged at INFO; unexpected
exceptions (RuntimeError, AssertionError, etc.) are logged at ERROR with full traceback to aid debugging.
Result ordering: Regardless of completion order, events are emitted in the original tool call order.
The responsibility split is now:
- _ActionBatch — owns everything about preparing a batch (truncation, blocked partitioning, execution)
- Agent — owns what to do with the results (emitting events, handling finish/refinement)
- ParallelToolExecutor — stays a pure concurrency primitive, passed in as a dependency

Thread safety warning

When TOOL_CONCURRENCY_LIMIT > 1, tools run in parallel threads sharing the same conversation object. Tools are not thread-safe by default. Callers opting into parallelism must ensure their tools are safe for concurrent execution
(no shared mutable filesystem state, no concurrent conversation mutations).

Evaluation

I ran an evaluation with SWE-bench to ensure that the default behavior is the one we already have in the repo [ref]

Report from trace investigation of OpenHands CLI:

No parallel tool calls detected -- the feature is cleanly disabled. Here's the full breakdown:
                                                                                                                                                                                                                     
  Trace Format                                                                                                                                                                                                       
                                                                                                                                                                                                                     
  - Events alternate between ActionEvent (tool call) and ObservationEvent (tool result)                                                                                                                              
  - Tools used: terminal (1150), file_editor (588), think (58), finish (25)                                                                                                                                          
  - 1,821 action events matched exactly 1,821 observation events across all 25 traces                                                                                                                                
                                                                                                                                                                                                                     
  Parallel Tool Call Check: CLEAN                                                                                                                                                                                    
                                                                                                                                                                                                                     
  - Zero shared llm_response_id across events (each LLM turn produced exactly 1 tool call)                                                                                                                           
  - Perfect action-observation interleaving -- no consecutive actions or observations                                                                                                                                
  - No tool_calls arrays, no parallel batching of any kind                                                                                                                                                           
  - All 25 conversations completed normally with a finish action

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:bda3d3c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-bda3d3c-python \
  ghcr.io/openhands/agent-server:bda3d3c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:bda3d3c-golang-amd64
ghcr.io/openhands/agent-server:bda3d3c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:bda3d3c-golang-arm64
ghcr.io/openhands/agent-server:bda3d3c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:bda3d3c-java-amd64
ghcr.io/openhands/agent-server:bda3d3c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:bda3d3c-java-arm64
ghcr.io/openhands/agent-server:bda3d3c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:bda3d3c-python-amd64
ghcr.io/openhands/agent-server:bda3d3c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:bda3d3c-python-arm64
ghcr.io/openhands/agent-server:bda3d3c-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:bda3d3c-golang
ghcr.io/openhands/agent-server:bda3d3c-java
ghcr.io/openhands/agent-server:bda3d3c-python

About Multi-Architecture Support

Each variant tag (e.g., bda3d3c-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., bda3d3c-python-amd64) are also available if needed

… tool execution Add infrastructure for executing multiple tool calls concurrently with a configurable global concurrency limit. Classes: - ToolExecutorSemaphore: Process-global singleton that limits concurrent tool executions across all agents and sub-agents. Configured via OPENHANDS_TOOL_CONCURRENCY_LIMIT environment variable (default: 8). - ParallelToolExecutor: Executes batches of tool calls concurrently using ThreadPoolExecutor, with concurrency controlled by the semaphore. Key design decisions: - Single layer of concurrency control via environment variable - Singleton pattern using __new__ for ToolExecutorSemaphore - ThreadPoolExecutor for I/O-bound tool execution - Results returned in original order regardless of completion order Related to #2350 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-11T19:41:43Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-03-11T19:41:59Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-03-11T19:43:58Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/agent
agent.py	286	30	89%	246, 250, 448–450, 452, 482–483, 490–491, 523, 576–577, 579, 619, 767–768, 773, 785–786, 791–792, 811–812, 814, 843, 851–852, 886, 893
base.py	188	22	88%	211, 268–270, 300, 304–308, 356–358, 368, 378, 386–387, 491, 528–529, 539–540
TOTAL	20427	5929	70%

all-hands-bot

🟡 Taste Rating: Acceptable - Requires Eval Verification

Core architecture is excellent. Making _execute_action_event side-effect-free (returns events instead of emitting directly) is exactly the right design — this eliminates the need for locks and makes the special case (parallel execution) become a normal case. Per-agent thread pools elegantly prevent deadlocks without complex detection logic.

The code is clean, tests are comprehensive and test real behavior (not mocks), and default concurrency=1 preserves backward compatibility.

However, this PR changes core agent execution flow (tool calling, event emission, state management). Even with the backward-compatible default, the execution path has been refactored significantly. Per repository policy, PRs that change agent behavior require lightweight eval verification before merge.

KEY INSIGHT

The refactoring turns concurrency from a special case requiring complex coordination into a normal case with side-effect-free functions. This is "good taste" — the right abstraction eliminates the complexity rather than managing it with locks and conditionals.

VERDICT

✅ Code quality is solid — approve from a technical perspective
⚠️ Eval verification required — per repo policy before merge

Move _emit_batch and _handle_finish logic from Agent into _ActionBatch as emit() and finalize() methods. Agent-specific logic (iterative refinement check, mark-finished callback) is injected via callables, keeping _ActionBatch decoupled from the Agent class. This simplifies Agent._execute_actions to a clean prepare → emit → finalize pipeline and gives _ActionBatch full ownership of the batch lifecycle. Co-authored-by: openhands <openhands@all-hands.dev>

enyst · 2026-03-13T18:09:12Z

@OpenHands Do a /codereview-roasted on this PR.

openhands-ai · 2026-03-13T18:09:36Z

I'm on it! enyst can track my progress at all-hands.dev

enyst

🔴 Needs improvement

[CRITICAL ISSUES]

[openhands-sdk/openhands/sdk/agent/parallel_executor.py, Lines 97-103] Breaking change disguised as the fallback path: TOOL_CONCURRENCY_LIMIT=1 still routes any multi-tool batch through ThreadPoolExecutor(max_workers=1). That is not the old behavior. It changes thread affinity, and because results are buffered until the batch finishes, it also changes when observations hit the conversation. I reproduced this locally with a tiny tool: both calls ran on ThreadPoolExecutor-*, not MainThread, and the second call saw zero prior ObservationEvents. So the PR description's “fully backward-compatible” claim is false. Fix: keep the old for action in action_events: execute + emit path when the limit is 1, and only use the batch executor when the limit is actually >1.
[openhands-sdk/openhands/sdk/agent/agent.py, Lines 389-396] Sequential semantics were silently changed: _ActionBatch.prepare() executes the whole batch before batch.emit(), so later tools in the same batch no longer see earlier observations in conversation.state.events. Even with concurrency effectively “off”, you've changed execution from run tool -> emit observation -> run next tool into run everything -> emit later. That's a real semantic regression for tools/hooks that inspect conversation state mid-batch. Fix: preserve incremental emission in the sequential path; don't reuse the buffered parallel path as the fallback.
[openhands-sdk/openhands/sdk/agent/agent.py, Lines 389-393] Unsafe by construction for the stock tool set: once TOOL_CONCURRENCY_LIMIT > 1, this code blindly parallelizes whatever tool calls the model produced. But the default tool set includes shared mutable executors like terminal sessions, browser sessions, and file-editor history. A warning in a docstring is not a concurrency model; it just means the env var can turn into garbled terminal I/O or corrupted edits. Fix: parallelize only tools that explicitly declare themselves safe, or serialize known shared-resource tools.

[TESTING GAPS]

[tests/sdk/agent/test_parallel_execution_integration.py, Lines 188-213] The default-limit regression isn't actually tested: this test only checks ordering. It doesn't assert same-thread execution or incremental state visibility, so it passes even though the fallback already changed behavior.
[tests/sdk/agent/test_parallel_execution_integration.py] Only synthetic tools are exercised: the tests never hit the real risky executors (terminal, file_editor, browser), so they won't catch the races this PR introduces.

VERDICT:
❌ Needs rework: the concurrency primitive is reasonable, but the fallback path is not backward-compatible, and the parallel path has no safety gate for the tools people actually run.

KEY INSIGHT:
The real bug here is that the code collapsed “sequential semantics” and “parallel scheduling” into one path. Those are not the same thing, and when limit=1 they must stay identical to the old behavior.

Because this PR changes agent tool-execution behavior, I'd still want a human maintainer to make the final merge call after lightweight evals even after the above is addressed.

enyst · 2026-03-13T18:49:09Z

OpenHands-GPT-5.4 here. I re-read my earlier review with fresh eyes, section by section, and I think I overstated parts of it. Here's my updated take.

1) `TOOL_CONCURRENCY_LIMIT=1` and backward compatibility

[openhands-sdk/openhands/sdk/agent/parallel_executor.py, Lines 97-103]
[openhands-sdk/openhands/sdk/agent/agent.py, Lines 389-396]

Is it actually the case? Yes.

Multi-tool batches still go through ThreadPoolExecutor(max_workers=1), and observations are emitted after the batch instead of between tool calls. So the fallback is not identical to the old for action: run -> emit -> next behavior.

How major is it? Medium, not catastrophic.

This is a real semantic change, but it is narrow: it only matters when the model emits multiple tool calls in one response. The PR's own trace evidence suggests that is rare in current real usage.

Is it real usage? Rare today, but real enough that the phrase "fully backward-compatible" feels too strong unless this is fixed or the claim is narrowed.

2) Later tools not seeing earlier observations from the same batch

[openhands-sdk/openhands/sdk/agent/agent.py, Lines 389-396]

Is it actually the case? Yes.

Because _ActionBatch.prepare() runs the executable subset before batch.emit(), tools that inspect conversation.state.events during execution will not see earlier observations from that same batch.

How major is it? Medium-to-low.

This is the strongest technical concern from my earlier review, but it's still niche rather than broad. Most built-in tools probably do not depend on seeing sibling observations mid-batch. Some custom or state-aware tools could.

Is it real usage? Mostly edge/custom-tool territory today.

So I still think this is worth calling out, but more as a compatibility nuance than as a major architectural failure.

3) Unsafe parallelism with terminal / file editor / browser

[openhands-sdk/openhands/sdk/agent/agent.py, Lines 389-393]

Is it actually the case? Yes.

If someone sets TOOL_CONCURRENCY_LIMIT > 1 and allows parallel batches that include shared-state tools, they can absolutely get races.

How major is it? Lower than I first said.

This is an opt-in feature, and the PR is explicit about the thread-safety caveat. That makes it much more of a deliberate first-cut tradeoff than a hidden regression.

Is it real usage? Yes, if somebody flips the env var globally and expects terminal/file-editor/browser calls to be safe. But because they had to opt into it, I would treat this as a scope/documentation clarity issue, not a hard blocker.

If anything, I'd want the docs/PR text to be very clear that the immediate sweet spot is read-only / RPC-like tools, not arbitrary shared-state tools.

4) Testing

[tests/sdk/agent/test_parallel_execution_integration.py]

I still think one regression test for the limit=1 fallback semantics would be valuable if backward compatibility is a goal.

That said, I want to explicitly soften one part of my earlier review: I do not think "the tests only use synthetic tools" is a serious problem by itself. Synthetic tools are perfectly reasonable for this kind of concurrency test. I would not ask for flaky terminal/browser integration tests as a merge gate.

Updated bottom line

After re-review, I would downgrade my earlier assessment.

I do not think this PR is fundamentally broken.

I think there is one real but narrow compatibility concern around the TOOL_CONCURRENCY_LIMIT=1 path versus the PR's "fully backward-compatible" framing. If the fallback keeps the old sequential emit semantics, or if the compatibility claim is softened, the rest looks like an acceptable first-cut tradeoff for an opt-in feature.

Because this still changes agent behavior, I'd keep this in COMMENT territory and let a human maintainer make the final call after lightweight evals.

If readers are skimming: my earlier review was too harsh on points (3) and part of (4); point (1) remains real, but narrower than I first framed it.

enyst · 2026-03-13T19:01:08Z

OpenHands-GPT-5.4 follow-up: you're right, and I need to correct my re-review.

I made a reasoning mistake there.

I wrote that the TOOL_CONCURRENCY_LIMIT=1 regression "only matters when the model emits multiple tool calls in one response" and then treated that as narrow because some current traces don't show it often. But that framing is wrong for this PR.

This PR exists precisely to handle the multi-tool-call case better. If multi-tool responses are common in real usage, then the changed fallback semantics are not some corner case — they hit the exact workload this feature is supposed to serve.

So I want to sharpen the conclusion:

The limit=1 path is a real compatibility issue if the goal is to preserve existing behavior by default.
It matters specifically on the feature's main path: responses containing multiple tool calls.
In other words, saying "it's only when there are multiple tool calls" does not make it small here; that's the whole point of the feature.

I would restate the concern more cleanly as:

The PR changes default semantics for multi-tool responses even when concurrency is effectively disabled (TOOL_CONCURRENCY_LIMIT=1), because execution is still batched and observations are emitted after batch completion rather than after each tool call.

I still think my earlier re-review was right to soften one thing: synthetic tools are fine for testing the concurrency primitive itself.

But I over-corrected on severity. If multi-tool responses are common in production, then the fallback-semantics issue is materially more important than I gave it credit for.

I also think this strengthens — not weakens — the concern about parallelizing shared-state tools. If the intended real-world usage is "LLM emits several tool calls and we want them parallel," then correctness for the default tool ecosystem is part of the core feature story, not just a docs footnote.

enyst · 2026-03-14T13:56:14Z

HUMAN: Just to note, I do think multiple tool calls are a normal occurrence. I don’t have numbers, but the agent here is confused; they’re often enough that, in practice, it’s like, see codex-cli fire up 12, think “woot was that a parallel tool call?”. Barely finish thought, see a bucket of 3, see another bucket of 5, conclude “yeah obviously”.

We have the eval trajectories we could compute to see the number of batches with the same llm_response?

VascoSch92 · 2026-03-14T14:18:44Z

HUMAN: Just to note, I do think multiple tool calls are a normal occurrence. I don’t have numbers, but the agent here is confused; they’re often enough that, in practice, it’s like, see codex-cli fire up 12, think “woot was that a parallel tool call?”. Barely finish thought, see a bucket of 3, see another bucket of 5, conclude “yeah obviously”.

We have the eval trajectories we could compute to see the number of batches with the same llm_response?

@enyst yes this could be interesting.

VascoSch92 · 2026-03-16T10:36:52Z

@enyst I answered your points.

TOOL_CONCURRENCY_LIMIT=1 and backward compatibility

The previous implementation wasn't 100% backward compatible, but it was a simple fix. I have updated the code and added a test case to ensure full compatibility.

Later tools not seeing earlier observations from the same batch

I’m not sure I follow this point. If multiple tool calls are being executed in parallel, they shouldn't need to depend on each other's outputs.

Unsafe parallelism with terminal / file editor / browser

This is a valid point. However, I avoided adding that specific logic here to prevent the PR from becoming even larger than it already is. My approach was to first implement parallel tool calling and then address the prevention of dependent tool calls in a subsequent update. This aligns with what we originally discussed in the issue, and is why the current default remains the standard behavior.

Testing

I have added integration tests to cover scenarios where num_workers = 1 and the request contains multiple tool calls. This should show that we have backward compatibility.

VascoSch92 · 2026-03-17T17:31:12Z

@xingyaoww

I'm not 100% sure if getting this from env var is the best approach here.. Maybe we can consider passing it to the Agent's configurations, similar to how we pass tools?

Makes sense. I changed as suggested by OH

Let's configure default tool concurrency to 4 or 8, and we should create a example file that prompts the agent to run multiple tools at the same time (even delegating multiple agents) to stress test the parallel tool execution system. I'd leaning towards merging this PR once this test passes via test-examples workflow

I created an example where we spawn subagents that perform code searches. This happens in parallel, and at the end there is also a parallel report to confirm the calls were actually executed in parallel.

If it is too verbose for an example I can also remove it. I just wanted to make sure we are correctly parallelizing the calls.

The report looks like:

================================================================================
Parallelism Report
================================================================================

  main agent batch (chatcmpl-272dfe5...):
    Parallel tools: ['task', 'task', 'task']

  sub-agent batch (chatcmpl-ae3ab59...):
    Parallel tools: ['terminal', 'file_editor']

  sub-agent batch (chatcmpl-9f48c72...):
    Parallel tools: ['file_editor', 'file_editor']

  sub-agent batch (chatcmpl-db48e30...):
    Parallel tools: ['file_editor', 'file_editor']

  sub-agent batch (chatcmpl-affbbd2...):
    Parallel tools: ['file_editor', 'file_editor']

  sub-agent batch (chatcmpl-3854bba...):
    Parallel tools: ['file_editor', 'file_editor']

  sub-agent batch (chatcmpl-3742934...):
    Parallel tools: ['file_editor', 'file_editor']

  sub-agent batch (chatcmpl-508834f...):
    Parallel tools: ['file_editor', 'file_editor', 'file_editor', 'file_editor']

  sub-agent batch (chatcmpl-4795c8d...):
    Parallel tools: ['terminal', 'file_editor']

  sub-agent batch (chatcmpl-922b7a5...):
    Parallel tools: ['file_editor', 'file_editor']

enyst · 2026-03-17T17:42:28Z

The report looks like:

Ha! There’s no terminal, terminal 😅. But if Xingyao is happy, I’m happy 🙏

VascoSch92 · 2026-03-17T17:46:00Z

The report looks like:

Ha! There’s no terminal, terminal 😅. But if Xingyao is happy, I’m happy 🙏

If you want I can force the example to have terminal, terminal :-)

enyst · 2026-03-17T17:53:38Z

The report looks like:

Ha! There’s no terminal, terminal 😅. But if Xingyao is happy, I’m happy 🙏

If you want I can force the example to have terminal, terminal :-)

😇 Just out of curiosity

github-actions · 2026-03-18T04:53:13Z

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`

Generated: 2026-03-18 05:06:22 UTC

Example	Status	Duration	Cost
01_standalone_sdk/02_custom_tools.py	✅ PASS	34.9s	$0.02
01_standalone_sdk/03_activate_skill.py	✅ PASS	17.9s	$0.02
01_standalone_sdk/05_use_llm_registry.py	✅ PASS	12.7s	$0.01
01_standalone_sdk/07_mcp_integration.py	✅ PASS	31.1s	$0.02
01_standalone_sdk/09_pause_example.py	✅ PASS	14.9s	$0.01
01_standalone_sdk/10_persistence.py	✅ PASS	27.2s	$0.03
01_standalone_sdk/11_async.py	✅ PASS	29.3s	$0.04
01_standalone_sdk/12_custom_secrets.py	✅ PASS	13.0s	$0.01
01_standalone_sdk/13_get_llm_metrics.py	✅ PASS	21.5s	$0.02
01_standalone_sdk/14_context_condenser.py	✅ PASS	2m 47s	$0.19
01_standalone_sdk/17_image_input.py	✅ PASS	16.9s	$0.01
01_standalone_sdk/18_send_message_while_processing.py	✅ PASS	24.7s	$0.01
01_standalone_sdk/19_llm_routing.py	✅ PASS	15.0s	$0.02
01_standalone_sdk/20_stuck_detector.py	✅ PASS	15.6s	$0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py	✅ PASS	11.8s	$0.00
01_standalone_sdk/22_anthropic_thinking.py	✅ PASS	17.8s	$0.01
01_standalone_sdk/23_responses_reasoning.py	✅ PASS	54.5s	$0.01
01_standalone_sdk/24_planning_agent_workflow.py	✅ PASS	52.1s	$0.05
01_standalone_sdk/25_agent_delegation.py	✅ PASS	51.5s	$0.06
01_standalone_sdk/26_custom_visualizer.py	✅ PASS	24.0s	$0.03
01_standalone_sdk/28_ask_agent_example.py	✅ PASS	31.2s	$0.03
01_standalone_sdk/29_llm_streaming.py	✅ PASS	33.7s	$0.02
01_standalone_sdk/30_tom_agent.py	✅ PASS	20.9s	$0.01
01_standalone_sdk/31_iterative_refinement.py	❌ FAIL Timed out after 600 seconds	10m 0s	--
01_standalone_sdk/32_configurable_security_policy.py	✅ PASS	16.4s	$0.02
01_standalone_sdk/34_critic_example.py	✅ PASS	9m 30s	$0.67
01_standalone_sdk/36_event_json_to_openai_messages.py	✅ PASS	13.3s	$0.01
01_standalone_sdk/37_llm_profile_store.py	✅ PASS	4.3s	$0.00
01_standalone_sdk/38_browser_session_recording.py	✅ PASS	27.3s	$0.03
01_standalone_sdk/39_llm_fallback.py	✅ PASS	10.4s	$0.01
01_standalone_sdk/40_acp_agent_example.py	✅ PASS	27.6s	$0.10
01_standalone_sdk/41_task_tool_set.py	✅ PASS	26.8s	$0.03
01_standalone_sdk/42_file_based_subagents.py	✅ PASS	1m 12s	$0.06
01_standalone_sdk/43_mixed_marketplace_skills/main.py	✅ PASS	7.8s	$0.00
01_standalone_sdk/44_model_switching_in_convo.py	✅ PASS	8.1s	$0.01
01_standalone_sdk/45_parallel_tool_execution.py	✅ PASS	3m 31s	$0.24
02_remote_agent_server/01_convo_with_local_agent_server.py	✅ PASS	35.3s	$0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py	✅ PASS	1m 38s	$0.05
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py	✅ PASS	50.6s	$0.00
02_remote_agent_server/04_convo_with_api_sandboxed_server.py	✅ PASS	1m 33s	$0.05
02_remote_agent_server/07_convo_with_cloud_workspace.py	✅ PASS	27.7s	$0.02
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py	✅ PASS	3m 37s	$0.02
02_remote_agent_server/09_acp_agent_with_remote_runtime.py	✅ PASS	58.9s	$0.12
04_llm_specific_tools/01_gpt5_apply_patch_preset.py	✅ PASS	20.9s	$0.03
04_llm_specific_tools/02_gemini_file_tools.py	✅ PASS	44.0s	$0.06
05_skills_and_plugins/01_loading_agentskills/main.py	✅ PASS	17.3s	$0.01
05_skills_and_plugins/02_loading_plugins/main.py	✅ PASS	17.9s	$0.02

❌ Some tests failed

Total: 47 | Passed: 46 | Failed: 1 | Total Cost: $2.25

Failed examples:

examples/01_standalone_sdk/31_iterative_refinement.py: Timed out after 600 seconds

View full workflow run

VascoSch92 · 2026-03-18T08:44:43Z

Reminder: The main focus of this PR is simply to add the infrastructure for parallel tool calls. There will be additional PRs to ensure everything is thread-safe. For this reason, we have set the default to sequential tool calls.

@enyst I investigate your concerns and here what is going on.

Current state: TerminalTool is not thread-safe. All commands go through a single PTY. Two concurrent commands on the same PTY interleave their bytes, corrupting output.

When it matters: Only when tool_concurrency_limit > 1 and the LLM emits multiple terminal calls in the same response. Subagents are fine: each gets its own session.

What's safe today: Parallel batches with different tool types (e.g., 3 delegate calls, or terminal + file editor). This is the common case.

Possible fixes (simplest to most capable)

Lock on execute()

Parallelism: none (serializes terminal calls)
Complexity: minimal — one lock
Tradeoff: correct but minimal speedup for terminal calls

Session pool

Parallelism: full
Complexity: high — N shell processes, cwd sync between sessions
Tradeoff: real parallelism but resource heavy and complex

Tmux pane per read-only command

Parallelism: for reads only
Complexity: medium — new pane per call, classify read vs write
Tradeoff: fast spawn (~10-20ms), inherits cwd, but tmux-only backend

What I propose:

For this PR: No terminal changes are needed. The parallel execution system works for the main use case (multiple sub-agents, mixed tool types), and the infrastructure is already in place. I experimented by forcing parallel tool calls in a terminal and found that conflicts occur occasionally (about 5 out of 30 times). This didn’t break the flow; the agent simply had to perform another terminal call to retrieve the missing information.
Immediately after this PR: I will submit a PR with a terminal lock. This is a minimal change with no behavioral impact. We will lose some parallelism, but we'll be safer, and we still gain performance (and save tokens) for mixed tool batches.
Future steps: We can experiment to see if the lock becomes a real bottleneck. If so, we can implement options 2 or 3.

For comparison, here are the parallel tool calls from my CC sessions:

Pattern	Occurrences	Max Batch Size
`[Read, Read, ...]`	20+	6
`[Bash, Bash]`	10+	2
`[Read, Grep]`	6+	2
`[Read, Bash, ...]`	5+	3
`[Grep, Grep, ...]`	2+	4
`[Read, Grep, Grep]`	2	3
`[Read, Read, Bash]`	2	3
`[Glob, Glob, Glob, Bash, Bash, Bash]`	1	6
`[Glob, Grep, Grep]`	1	3
`[Glob, Glob, Bash]`	1	3
`[Grep, Read, Read, Read]`	1	4
`[Grep, Bash, Bash]`	1	3
`[Agent, Agent]`	1	2
`[WebFetch, Agent]`	1	2
`[Task, Task, Task, Task, Task, Task, Task]`	1	7

Note that Read is paralleizabe (because of FileEditor view).

enyst · 2026-03-18T15:03:29Z

I experimented by forcing parallel tool calls in a terminal and found that conflicts occur occasionally (about 5 out of 30 times). This didn’t break the flow; the agent simply had to perform another terminal call to retrieve the missing information.

Aha! Thank you. 😅

Your proposals sound good to me, thanks for satisfying my dumb little curiosity.

xingyaoww

With this context, #2390 (comment)

This PR LGTM!

Immediately after this PR: I will submit a PR with a terminal lock. This is a minimal change with no behavioral impact. We will lose some parallelism, but we'll be safer, and we still gain performance (and save tokens) for mixed tool batches.

Shall we create an issue here in case we forget about it?

VascoSch92 · 2026-03-19T21:23:15Z

With this context, #2390 (comment)

This PR LGTM!

Immediately after this PR: I will submit a PR with a terminal lock. This is a minimal change with no behavioral impact. We will lose some parallelism, but we'll be safer, and we still gain performance (and save tokens) for mixed tool batches.

Shall we create an issue here in case we forget about it?

I have a linear ticket to remember myself to create all the relative issues to continue the parallel tool calls dev.

Moreover, i also have to create documentation as I introduced a new example.

Rename FailingAction/FailingObservation to ParallelFailingAction/ParallelFailingObservation to avoid name collisions with the existing test classes in tests/sdk/conversation/local/test_rerun_actions.py. When pytest-xdist runs tests in parallel, both files get loaded in the same process, causing the Action/Observation class registry to detect duplicate class definitions and raise ValidationErrors. Co-authored-by: openhands <openhands@all-hands.dev>

parallel tool calls

e224f48

VascoSch92 requested a review from all-hands-bot March 12, 2026 14:44

This comment was marked as outdated.

Sign in to view

VascoSch92 changed the title ~~DRAFT: feat: add ParallelToolExecutor and ToolExecutorSemaphore for parallel tool execution~~ feat(sdk/agent): Parallel Tool Call Execution Mar 12, 2026

all-hands-bot feedback

7667ed9

VascoSch92 requested a review from all-hands-bot March 12, 2026 15:08

This comment was marked as outdated.

Sign in to view

all-hands-bot feedback

49f0558

VascoSch92 requested a review from all-hands-bot March 12, 2026 15:29

This comment was marked as outdated.

Sign in to view

refactoring

6f6f758

VascoSch92 requested a review from all-hands-bot March 13, 2026 11:44

This comment was marked as outdated.

Sign in to view

VascoSch92 marked this pull request as ready for review March 13, 2026 14:08

Merge branch 'main' into parallel-tool-execution

8818874

all-hands-bot reviewed Mar 13, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/agent/agent.py

Comment thread openhands-sdk/openhands/sdk/agent/agent.py

Comment thread openhands-sdk/openhands/sdk/agent/parallel_executor.py

VascoSch92 requested review from csmith49, enyst and xingyaoww March 13, 2026 14:12

xingyaoww reviewed Mar 13, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/agent/agent.py Outdated

enyst reviewed Mar 13, 2026

View reviewed changes

This comment was marked as duplicate.

Sign in to view

comments OH

7826436

Merge branch 'main' into parallel-tool-execution

03a9cbf

enyst requested a review from xingyaoww March 17, 2026 11:31

xingyaoww reviewed Mar 17, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/agent/parallel_executor.py Outdated

xingyaoww reviewed Mar 17, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/agent/agent.py

xingyaoww reviewed Mar 17, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/agent/parallel_executor.py Outdated

add example plus feedbacks

4606294

enyst reviewed Mar 17, 2026

View reviewed changes

Comment thread openhands-sdk/openhands/sdk/agent/base.py

xingyaoww added the test-examples Run all applicable "examples/" files. Expensive operation. label Mar 18, 2026

VascoSch92 requested a review from xingyaoww March 18, 2026 17:02

xingyaoww approved these changes Mar 19, 2026

View reviewed changes

VascoSch92 and others added 3 commits March 20, 2026 10:05

Merge branch 'main' into parallel-tool-execution

6796e89

fix pre-commit

b291004

VascoSch92 merged commit 8d7f482 into main Mar 20, 2026
30 checks passed

VascoSch92 deleted the parallel-tool-execution branch March 20, 2026 11:18

This was referenced Mar 20, 2026

[Roadmap] Enable Parallel Tool Execution by Default #2525

Open

Proposal: Parallel Tool Execution #2350

Closed

Conversation

VascoSch92 commented Mar 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Design

Thread safety warning

Evaluation

Uh oh!

github-actions bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 Taste Rating: Acceptable - Requires Eval Verification

KEY INSIGHT

VERDICT

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

enyst commented Mar 13, 2026

Uh oh!

openhands-ai bot commented Mar 13, 2026

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

enyst commented Mar 13, 2026

1) TOOL_CONCURRENCY_LIMIT=1 and backward compatibility

2) Later tools not seeing earlier observations from the same batch

3) Unsafe parallelism with terminal / file editor / browser

4) Testing

Updated bottom line

Uh oh!

enyst commented Mar 13, 2026

Uh oh!

enyst commented Mar 14, 2026

Uh oh!

VascoSch92 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VascoSch92 commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VascoSch92 commented Mar 17, 2026

Uh oh!

Uh oh!

enyst commented Mar 17, 2026

Uh oh!

VascoSch92 commented Mar 17, 2026

Uh oh!

enyst commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

❌ Some tests failed

Uh oh!

VascoSch92 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

VascoSch92 commented Mar 11, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Mar 11, 2026 •

edited

Loading

github-actions bot commented Mar 11, 2026 •

edited

Loading

github-actions bot commented Mar 11, 2026 •

edited

Loading

1) `TOOL_CONCURRENCY_LIMIT=1` and backward compatibility

VascoSch92 commented Mar 14, 2026 •

edited

Loading

github-actions bot commented Mar 18, 2026 •

edited

Loading

🔄 Running Examples with `openhands/claude-haiku-4-5-20251001`

VascoSch92 commented Mar 18, 2026 •

edited

Loading