Skip to content

Conversation

@CarlosCuevas
Copy link
Collaborator

@CarlosCuevas CarlosCuevas commented Dec 4, 2025

TL;DR

Adds a write lock to SubprocessCLITransport to prevent concurrent writes from parallel subagents.


Overview

When multiple subagents run in parallel and invoke MCP tools, the CLI sends concurrent control_request messages. Each handler tries to write a response back to the subprocess stdin at the same time. Trio's TextSendStream isn't thread-safe for concurrent access, so this causes BusyResourceError.

This PR adds an anyio.Lock around all write operations (write(), end_input(), and the stdin-closing part of close()). The lock serializes concurrent writes so they happen one at a time. The _ready flag is now set inside the lock during close() to prevent a TOCTOU race where write() checks _ready, then close() sets it and closes the stream before write() actually sends data.


Call Flow

flowchart TD
    A["write()<br/>subprocess_cli.py:505"] --> B["acquire _write_lock<br/>subprocess_cli.py:507"]
    B --> C["check _ready & stream<br/>subprocess_cli.py:509"]
    C --> D["_stdin_stream.send()<br/>subprocess_cli.py:523"]
    
    E["close()<br/>subprocess_cli.py:458"] --> F["acquire _write_lock<br/>subprocess_cli.py:478"]
    F --> G["set _ready = False<br/>subprocess_cli.py:479"]
    G --> H["close _stdin_stream<br/>subprocess_cli.py:481"]
    
    I["end_input()<br/>subprocess_cli.py:531"] --> J["acquire _write_lock<br/>subprocess_cli.py:533"]
    J --> K["close _stdin_stream<br/>subprocess_cli.py:535"]
Loading

else _DEFAULT_MAX_BUFFER_SIZE
)
self._temp_files: list[str] = [] # Track temporary files for cleanup
self._write_lock: anyio.Lock = anyio.Lock()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Write Lock Initialization

The lock serializes concurrent writes to stdin. When parallel subagents invoke MCP tools, multiple handlers try to write responses at the same time. Trio's TextSendStream.send() isn't safe for concurrent use and raises BusyResourceError.


🤖 Generated with Claude Code

Comment on lines 505 to +544
async def write(self, data: str) -> None:
"""Write raw data to the transport."""
# Check if ready (like TypeScript)
if not self._ready or not self._stdin_stream:
raise CLIConnectionError("ProcessTransport is not ready for writing")

# Check if process is still alive (like TypeScript)
if self._process and self._process.returncode is not None:
raise CLIConnectionError(
f"Cannot write to terminated process (exit code: {self._process.returncode})"
)
async with self._write_lock:
# All checks inside lock to prevent TOCTOU races with close()/end_input()
if not self._ready or not self._stdin_stream:
raise CLIConnectionError("ProcessTransport is not ready for writing")

if self._process and self._process.returncode is not None:
raise CLIConnectionError(
f"Cannot write to terminated process (exit code: {self._process.returncode})"
)

# Check for exit errors (like TypeScript)
if self._exit_error:
raise CLIConnectionError(
f"Cannot write to process that exited with error: {self._exit_error}"
) from self._exit_error
if self._exit_error:
raise CLIConnectionError(
f"Cannot write to process that exited with error: {self._exit_error}"
) from self._exit_error

try:
await self._stdin_stream.send(data)
except Exception as e:
self._ready = False # Mark as not ready (like TypeScript)
self._exit_error = CLIConnectionError(
f"Failed to write to process stdin: {e}"
)
raise self._exit_error from e
try:
await self._stdin_stream.send(data)
except Exception as e:
self._ready = False
self._exit_error = CLIConnectionError(
f"Failed to write to process stdin: {e}"
)
raise self._exit_error from e
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. Write Method with Lock

The entire write() method now runs inside the lock. This prevents two issues:

  1. Concurrent sends: Multiple coroutines can't call send() at the same time
  2. TOCTOU race: The _ready check and send() are now atomic. Previously, close() could set _ready=False and close the stream between checking _ready and calling send().

Related: See comment 1 for lock initialization.


🤖 Generated with Claude Code

assert network["httpProxyPort"] == 8080
assert network["socksProxyPort"] == 8081

def test_concurrent_writes_are_serialized(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. Concurrent Write Tests

Two tests verify the lock works correctly:

  1. test_concurrent_writes_are_serialized: Spawns 10 concurrent writes with the lock enabled. All should succeed.
  2. test_concurrent_writes_fail_without_lock: Replaces the lock with a no-op. This proves the race condition exists without the fix - you get "another task is already" errors.

Both tests use a real subprocess with TextSendStream to match production behavior. They run on the Trio backend where the bug surfaces.


🤖 Generated with Claude Code

@CarlosCuevas CarlosCuevas marked this pull request as ready for review December 4, 2025 16:49
When parallel subagents invoke MCP tools, the CLI sends multiple
concurrent control_request messages. Without synchronization, handlers
race to write responses back, causing trio.BusyResourceError.

This adds an anyio.Lock to serialize writes to stdin, and moves the
_ready flag inside the lock to prevent TOCTOU races with close().

:house: Remote-Dev: homespace
Replace trio.lowlevel.FdStream (Unix-only) with anyio.open_process()
which works on both Unix and Windows. The tests now use a real
subprocess with the same stream setup as production code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>


:house: Remote-Dev: homespace
@CarlosCuevas CarlosCuevas force-pushed the carlos/parallel_subagents_tool_bug branch from a5c0fbb to 42bae65 Compare December 4, 2025 20:24
@ashwin-ant ashwin-ant merged commit 2d67166 into main Dec 4, 2025
49 of 51 checks passed
@ashwin-ant ashwin-ant deleted the carlos/parallel_subagents_tool_bug branch December 4, 2025 22:27
ashwin-ant added a commit that referenced this pull request Dec 5, 2025
Include the three bug fixes that were merged but not documented:
- #388: Faster CLI error propagation
- #385: Pydantic 2.12+ compatibility
- #391: Concurrent subagent write lock

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants