fix: add write lock to prevent concurrent transport writes #370

CarlosCuevas · 2025-11-25T20:39:09Z

TL;DR

Adds anyio.Lock to SubprocessCLITransport.write() to fix BusyResourceError when parallel subagents invoke MCP tools concurrently.

Overview

When multiple subagents run in parallel, they each receive control_request messages from the CLI (e.g., for MCP tool calls). These requests are handled concurrently via _tg.start_soon() in query.py:190, and each handler eventually calls transport.write() to send the response.

The problem: trio's FdStream (the underlying stdin writer) does not allow concurrent sends. Without synchronization, this causes trio.BusyResourceError: another task is using this stream for send.

This fix adds a _write_lock to serialize all writes to the transport. The lock also protects close() and end_input() to prevent TOCTOU races where a write could start just as the stream is being closed.

Call Flow

flowchart TD
    A["CLI sends control_request<br/>query.py:185"] --> B["start_soon(_handle_control_request)<br/>query.py:190"]
    B --> C["Handler 1<br/>query.py:213"]
    B --> D["Handler 2<br/>query.py:213"]
    B --> E["Handler N...<br/>query.py:213"]
    C --> F["transport.write()<br/>subprocess_cli.py:449"]
    D --> F
    E --> F
    F --> G["async with _write_lock<br/>subprocess_cli.py:451"]
    G --> H["stdin_stream.send()<br/>subprocess_cli.py:467"]

Without the lock at step G, concurrent calls to H would race and crash.

CarlosCuevas · 2025-11-25T22:35:22Z

src/claude_agent_sdk/_internal/transport/subprocess_cli.py

            else _DEFAULT_MAX_BUFFER_SIZE
        )
        self._temp_files: list[str] = []  # Track temporary files for cleanup
+        self._write_lock: anyio.Lock = anyio.Lock()


1. Write Lock Declaration [Core logic]

Initializes the anyio.Lock that serializes all transport writes. When parallel subagents handle concurrent control_request messages, each calls transport.write() to send responses. trio's underlying FdStream doesn't allow concurrent sends—this lock prevents the BusyResourceError crash.

Related: See comment 2 for where the lock is acquired.

🤖 Generated with Claude Code

CarlosCuevas · 2025-11-25T22:35:38Z

src/claude_agent_sdk/_internal/transport/subprocess_cli.py

+        async with self._write_lock:
+            # All checks inside lock to prevent TOCTOU races with close()/end_input()
+            if not self._ready or not self._stdin_stream:
+                raise CLIConnectionError("ProcessTransport is not ready for writing")
+
+            if self._process and self._process.returncode is not None:
+                raise CLIConnectionError(
+                    f"Cannot write to terminated process (exit code: {self._process.returncode})"
+                )

-        # Check for exit errors (like TypeScript)
-        if self._exit_error:
-            raise CLIConnectionError(
-                f"Cannot write to process that exited with error: {self._exit_error}"
-            ) from self._exit_error
+            if self._exit_error:
+                raise CLIConnectionError(
+                    f"Cannot write to process that exited with error: {self._exit_error}"
+                ) from self._exit_error

-        try:
-            await self._stdin_stream.send(data)
-        except Exception as e:
-            self._ready = False  # Mark as not ready (like TypeScript)
-            self._exit_error = CLIConnectionError(
-                f"Failed to write to process stdin: {e}"
-            )
-            raise self._exit_error from e
+            try:
+                await self._stdin_stream.send(data)
+            except Exception as e:
+                self._ready = False
+                self._exit_error = CLIConnectionError(
+                    f"Failed to write to process stdin: {e}"
+                )
+                raise self._exit_error from e


2. write() Serialization [Core logic]

The core fix: all write logic runs inside async with self._write_lock. This includes:

State checks (_ready, _stdin_stream, returncode, _exit_error)

The actual stdin_stream.send() call

Moving checks inside the lock prevents TOCTOU races—without this, a write could pass all checks, then close() could clear _stdin_stream, and the write would fail on a None stream.

🤖 Generated with Claude Code

CarlosCuevas · 2025-11-25T22:36:01Z

src/claude_agent_sdk/_internal/transport/subprocess_cli.py

+        # Close stdin stream (acquire lock to prevent race with concurrent writes)
+        async with self._write_lock:
+            self._ready = False  # Set inside lock to prevent TOCTOU with write()
+            if self._stdin_stream:
+                with suppress(Exception):
+                    await self._stdin_stream.aclose()
+                self._stdin_stream = None


3. close() TOCTOU Prevention [Core logic]

Acquires _write_lock before closing stdin and setting _ready = False. This coordinates with write():

If write() holds the lock, close() waits until the write finishes

Once close() acquires the lock, it sets _ready = False inside the critical section

Any subsequent write() will see _ready = False when it acquires the lock

Without this coordination, close() could clear the stream while write() was mid-send.

🤖 Generated with Claude Code

CarlosCuevas · 2025-11-25T22:36:13Z

tests/test_transport.py

+    def test_concurrent_writes_are_serialized(self):
+        """Test that concurrent write() calls are serialized by the lock.
+
+        When parallel subagents invoke MCP tools, they trigger concurrent write()
+        calls. Without the _write_lock, trio raises BusyResourceError.
+
+        Uses a real subprocess with the same stream setup as production:
+        process.stdin -> TextSendStream
+        """
+
+        async def _test():
+            import sys
+            from subprocess import PIPE
+
+            from anyio.streams.text import TextSendStream
+
+            # Create a real subprocess that consumes stdin (cross-platform)
+            process = await anyio.open_process(
+                [sys.executable, "-c", "import sys; sys.stdin.read()"],
+                stdin=PIPE,
+                stdout=PIPE,
+                stderr=PIPE,
+            )
+
+            try:
+                transport = SubprocessCLITransport(
+                    prompt="test",
+                    options=ClaudeAgentOptions(cli_path="/usr/bin/claude"),
+                )
+
+                # Same setup as production: TextSendStream wrapping process.stdin
+                transport._ready = True
+                transport._process = MagicMock(returncode=None)
+                transport._stdin_stream = TextSendStream(process.stdin)
+
+                # Spawn concurrent writes - the lock should serialize them
+                num_writes = 10
+                errors: list[Exception] = []
+
+                async def do_write(i: int):
+                    try:
+                        await transport.write(f'{{"msg": {i}}}\n')
+                    except Exception as e:
+                        errors.append(e)
+
+                async with anyio.create_task_group() as tg:
+                    for i in range(num_writes):
+                        tg.start_soon(do_write, i)
+
+                # All writes should succeed - the lock serializes them
+                assert len(errors) == 0, f"Got errors: {errors}"
+            finally:
+                process.terminate()
+                await process.wait()
+
+        anyio.run(_test, backend="trio")


4. Concurrency Tests [Test]

Two complementary tests verify the fix:

Positive test (this method): Spawns 10 concurrent write() calls against a real subprocess with TextSendStream—the same stream type used in production. All writes should succeed because the lock serializes them.

Negative test (test_concurrent_writes_fail_without_lock): Replaces the lock with a no-op, proving the race condition exists. Without the lock, trio raises BusyResourceError: another task is using this stream for send.

Both tests use a real process rather than mocks to ensure the concurrency behavior matches production.

🤖 Generated with Claude Code

When parallel subagents invoke MCP tools, the CLI sends multiple concurrent control_request messages. Without synchronization, handlers race to write responses back, causing trio.BusyResourceError. This adds an anyio.Lock to serialize writes to stdin, and moves the _ready flag inside the lock to prevent TOCTOU races with close(). :house: Remote-Dev: homespace

Replace trio.lowlevel.FdStream (Unix-only) with anyio.open_process() which works on both Unix and Windows. The tests now use a real subprocess with the same stream setup as production code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> :house: Remote-Dev: homespace

CarlosCuevas commented Nov 25, 2025

View reviewed changes

CarlosCuevas added 2 commits December 2, 2025 14:19

CarlosCuevas force-pushed the carlos/parallel_subagents_tool_bug branch from a41991f to a3edee4 Compare December 2, 2025 14:19

CarlosCuevas closed this Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add write lock to prevent concurrent transport writes #370

fix: add write lock to prevent concurrent transport writes #370

Uh oh!

CarlosCuevas commented Nov 25, 2025 •

edited

Loading

Uh oh!

CarlosCuevas Nov 25, 2025

Uh oh!

CarlosCuevas Nov 25, 2025

Uh oh!

CarlosCuevas Nov 25, 2025

Uh oh!

CarlosCuevas Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: add write lock to prevent concurrent transport writes #370

fix: add write lock to prevent concurrent transport writes #370

Uh oh!

Conversation

CarlosCuevas commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Overview

Call Flow

Uh oh!

CarlosCuevas Nov 25, 2025

Choose a reason for hiding this comment

1. Write Lock Declaration [Core logic]

Uh oh!

CarlosCuevas Nov 25, 2025

Choose a reason for hiding this comment

2. write() Serialization [Core logic]

Uh oh!

CarlosCuevas Nov 25, 2025

Choose a reason for hiding this comment

3. close() TOCTOU Prevention [Core logic]

Uh oh!

CarlosCuevas Nov 25, 2025

Choose a reason for hiding this comment

4. Concurrency Tests [Test]

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CarlosCuevas commented Nov 25, 2025 •

edited

Loading