SDK MCP Servers fail with "ProcessTransport is not ready for writing" error - Tool functions never execute

## Issue Title
     SDK MCP Servers fail with "ProcessTransport is not ready for writing" error - Tool functions never execute

     ## Issue Labels
     - bug
     - high priority
     - sdk-mcp-servers

     ---

     ## Issue Description

     ### Summary
     The Python Claude Agent SDK (v0.1.4) fails to execute custom tool functions when using SDK MCP servers created with `create_sdk_mcp_server()`. The SDK throws a `CLIConnectionError: ProcessTransport is not
     ready for writing` error during control protocol communication, preventing tool execution entirely.

     ### Environment
     - **Python SDK Version**: 0.1.4 (latest)
     - **Claude Code CLI Version**: 2.0.21
     - **Python Version**: 3.13.3
     - **Operating System**: macOS 14.6.0 (Darwin 24.6.0)
     - **Node.js Version**: v22.19.0
     - **Installation Method**: pip install claude-agent-sdk

     ### Expected Behavior
     Following the official documentation at https://docs.claude.com/en/api/agent-sdk/custom-tools, custom tools defined with the `@tool` decorator and registered via `create_sdk_mcp_server()` should:
     1. Be recognized by Claude
     2. Be called when Claude decides to use them
     3. Execute the Python function implementation
     4. Return results to Claude

     ### Actual Behavior
     The SDK recognizes the tools but the Python tool functions **never execute**. Instead, the SDK crashes with:

     ```
     claude_agent_sdk._errors.CLIConnectionError: ProcessTransport is not ready for writing

     Error: write EPIPE
         at afterWriteDispatched (node:internal/stream_base_commons:159:15)
     ```

     The error occurs in `_handle_control_request` when the SDK attempts to send control responses back to the Claude Code CLI subprocess.

     ### Code to Reproduce

     **Complete working example that fails:**

     ```python
     #!/usr/bin/env python3
     """
     Minimal reproduction case for SDK MCP server bug
     """

     import asyncio
     from claude_agent_sdk import tool, create_sdk_mcp_server, query, ClaudeAgentOptions

     # Simple counter to prove function execution
     call_count = 0

     @tool(
         name="say_hello",
         description="Say hello to a person by name",
         input_schema={"name": str}
     )
     async def say_hello_tool(args):
         """Simple hello world tool"""
         global call_count
         call_count += 1
         print(f"🎉 TOOL EXECUTED! (call #{call_count})")

         name = args.get("name", "World")
         return {
             "content": [{
                 "type": "text",
                 "text": f"Hello, {name}!"
             }]
         }

     async def main():
         # Create MCP server
         hello_server = create_sdk_mcp_server(
             name="hello",
             version="1.0.0",
             tools=[say_hello_tool]
         )

         # Configure options
         options = ClaudeAgentOptions(
             system_prompt="You are a helpful assistant. When the user asks you to greet someone, use the say_hello tool.",
             mcp_servers={"hello": hello_server},
             allowed_tools=["mcp__hello__say_hello"],
             max_turns=5,
             model="claude-sonnet-4-5-20250929",
             permission_mode="bypassPermissions"
         )

         # Streaming input (required for SDK MCP servers)
         async def generate_prompt():
             yield {
                 "type": "user",
                 "message": {
                     "role": "user",
                     "content": "Please say hello to Alice using the say_hello tool"
                 }
             }

         # Execute query
         async for message in query(
             prompt=generate_prompt(),
             options=options
         ):
             print(f"Message: {type(message).__name__}")

         print(f"Tool call count: {call_count}")  # Always prints 0 - tool never executes!

     if __name__ == "__main__":
         asyncio.run(main())
     ```

     **Run with:**
     ```bash
     python test_hello.py
     ```

     **Result:**
     - Tool call count: **0** (tool function never executes)
     - Error: `CLIConnectionError: ProcessTransport is not ready for writing`

     ### Comparison: TypeScript SDK Works Perfectly

     I created an **identical test using the TypeScript SDK** (v0.1.21) and it **works flawlessly**:

     ```typescript
     import { query, tool, createSdkMcpServer } from "@anthropic-ai/claude-agent-sdk";
     import { z } from "zod";

     let toolCallCount = 0;

     const helloServer = createSdkMcpServer({
       name: "hello",
       version: "1.0.0",
       tools: [
         tool(
           "say_hello",
           "Say hello to a person by name",
           {
             name: z.string().describe("The name of the person to greet")
           },
           async (args) => {
             toolCallCount++;
             console.log(`🎉 TOOL EXECUTED! Call #${toolCallCount}`);
             return {
               content: [{
                 type: "text",
                 text: `Hello, ${args.name}!`
               }]
             };
           }
         )
       ]
     });

     async function main() {
       async function* generateMessages() {
         yield {
           type: "user" as const,
           message: {
             role: "user" as const,
             content: "Please say hello to Alice using the say_hello tool"
           }
         };
       }

       for await (const message of query({
         prompt: generateMessages(),
         options: {
           mcpServers: { "hello": helloServer },
           allowedTools: ["mcp__hello__say_hello"],
           maxTurns: 5,
           model: "claude-sonnet-4-5-20250929",
           permissionMode: "bypassPermissions"
         }
       })) {
         console.log(`Message: ${message.type}`);
       }

       console.log(`Tool call count: ${toolCallCount}`);  // Prints 1 - SUCCESS!
     }

     main().catch(console.error);
     ```

     **Result:**
     - ✅ Tool call count: **1** (tool executes successfully)
     - ✅ No errors
     - ✅ Function prints "🎉 TOOL EXECUTED! Call #1"

     This proves the issue is **specific to the Python SDK implementation**, not the concept or Claude Code CLI.

     ### Root Cause Analysis

     Through extensive debugging, I discovered:

     1. **Manual Protocol Implementation Works**: I created a manual implementation of the control protocol (without using the SDK's Query class) that successfully:
        - Starts the CLI subprocess
        - Sends user messages
        - Receives MCP `initialize` requests
        - Responds with proper `control_response` format
        - Handles `tools/list` and `tools/call` requests
        - **Tool functions execute successfully**

     2. **SDK's Query Class Fails**: The SDK's `Query` class in `_handle_control_request` attempts to write control responses but `self._ready` is `False`, causing the write to fail.

     3. **Race Condition Suspected**: The error suggests a timing issue between:
        - Transport initialization (`connect()` sets `_ready = True`)
        - Background task startup (`query.start()`)
        - First control request arrival
        - Transport state management

     4. **EPIPE Error**: The Node.js CLI subprocess shows `Error: write EPIPE`, indicating Python closed the pipe or the subprocess died before communication completed.

     ### Files Involved

     **Python SDK Files (in venv/lib/python3.13/site-packages/claude_agent_sdk/):**
     - `_internal/query.py` - Line 303: where write fails in `_handle_control_request`
     - `_internal/transport/subprocess_cli.py` - Line 356: where `CLIConnectionError` is raised in `write()`
     - `_internal/client.py` - Query initialization and orchestration

     **Expected Control Response Format** (from SDK source):
     ```python
     {
         "type": "control_response",
         "response": {
             "subtype": "success",
             "request_id": request_id,
             "response": {
                 "mcp_response": {
                     "jsonrpc": "2.0",
                     "id": mcp_id,
                     "result": {...}
                 }
             }
         }
     }
     ```

     ### Additional Evidence

     **Full Error Traceback:**
     ```
     ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
     +-+---------------- 1 ----------------
       | Traceback (most recent call last):
       |   File "venv/lib/python3.13/site-packages/claude_agent_sdk/_internal/query.py", line 303, in _handle_control_request
       |     await self.transport.write(json.dumps(success_response) + "\n")
       |   File "venv/lib/python3.13/site-packages/claude_agent_sdk/_internal/transport/subprocess_cli.py", line 356, in write
       |     raise CLIConnectionError("ProcessTransport is not ready for writing")
       | claude_agent_sdk._errors.CLIConnectionError: ProcessTransport is not ready for writing
       |
       | During handling of the above exception, another exception occurred:
       |
       | Traceback (most recent call last):
       |   File "venv/lib/python3.13/site-packages/claude_agent_sdk/_internal/query.py", line 315, in _handle_control_request
       |     await self.transport.write(json.dumps(error_response) + "\n")
       |   File "venv/lib/python3.13/site-packages/claude_agent_sdk/_internal/transport/subprocess_cli.py", line 356, in write
       |     raise CLIConnectionError("ProcessTransport is not ready for writing")
       | claude_agent_sdk._errors.CLIConnectionError: ProcessTransport is not ready for writing
       +------------------------------------

     Error: write EPIPE
         at afterWriteDispatched (node:internal/stream_base_commons:159:15)
         at writeGeneric (node:internal/stream_base_commons:150:3)
         at Socket._writeGeneric (node:net:966:11)
         at Socket._write (node:net:978:8)
         at writeOrBuffer (node:internal/streams/writable:572:12)
     ```

     ### Impact

     This bug **completely blocks** the use of custom Python tools with the Agent SDK, making the Python SDK essentially unusable for any real-world applications that need custom functionality beyond built-in
     tools.

     **Use cases blocked:**
     - Database query tools
     - API integration tools
     - Business logic tools
     - Data processing tools
     - Any domain-specific functionality

     ### Workaround

     Currently, the **only workaround** is to:
     1. Use the TypeScript SDK instead (confirmed working)
     2. Or use external MCP servers (stdio type) instead of SDK MCP servers

     However, this defeats the purpose of the Python SDK for Python-native applications.

     ---

     ## Compensation Request

     **I am requesting API credits or quota compensation for this extensive debugging work.**

     ### Time and Resources Invested

     I have spent **significant time and money** debugging this issue on Anthropic's behalf:

     1. **API Costs**: Multiple hours of API calls testing various configurations, debugging scenarios, and creating minimal reproduction cases across both Python and TypeScript implementations.

     2. **Engineering Time**:
        - Traced through SDK source code
        - Created manual protocol implementation to isolate the bug
        - Tested TypeScript SDK to confirm the issue is Python-specific
        - Documented complete reproduction steps
        - Provided root cause analysis
        - Created comparison examples

     3. **Testing Infrastructure**:
        - Set up Python environment
        - Set up TypeScript environment
        - Created comprehensive test suites
        - Validated against official documentation

     ### Value Provided to Anthropic

     This detailed bug report provides:
     - ✅ Clear reproduction case (copy-paste ready)
     - ✅ Root cause analysis with specific file/line references
     - ✅ Comparison showing TypeScript SDK works (proving it's not a design issue)
     - ✅ Evidence from manual protocol implementation (proving the concept is sound)
     - ✅ Complete error traces and debugging information
     - ✅ Impact assessment
     - ✅ Documentation of expected vs actual behavior

     **This is production-blocking bug report quality that would typically require significant internal QA resources.**

     ### Request

     Given the time, API costs, and engineering effort invested in debugging your Python SDK implementation, I respectfully request:

     - **API credit compensation** for the extensive testing and debugging performed
     - **Priority review** of this issue given its blocking nature
     - **Public acknowledgment** if this report leads to a fix

     I am effectively providing free QA and debugging services for a critical bug in a released SDK. Fair compensation would align with Anthropic's commitment to its developer community.

     ---

     ## Additional Notes

     - Complete test files and debugging scripts are available if needed
     - I can provide access to the full project repository for verification
     - I'm available to test fixes or provide additional information
     - The manual protocol implementation can serve as a reference for the correct approach

     ## References

     - Official Documentation: https://docs.claude.com/en/api/agent-sdk/custom-tools
     - Python SDK Repository: https://github.com/anthropics/claude-agent-sdk-python
     - TypeScript SDK Repository: https://github.com/anthropics/claude-agent-sdk-typescript

     ---

    **This issue is blocking real-world Python applications from using the Claude Agent SDK. Please prioritize a fix.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SDK MCP Servers fail with "ProcessTransport is not ready for writing" error - Tool functions never execute #266

Issue Title

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SDK MCP Servers fail with "ProcessTransport is not ready for writing" error - Tool functions never execute #266

Description

Issue Title

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions