Skip to content

SDK MCP Servers fail with "ProcessTransport is not ready for writing" error - Tool functions never executeΒ #266

@tavostefani

Description

@tavostefani

Issue Title

 SDK MCP Servers fail with "ProcessTransport is not ready for writing" error - Tool functions never execute

 ## Issue Labels
 - bug
 - high priority
 - sdk-mcp-servers

 ---

 ## Issue Description

 ### Summary
 The Python Claude Agent SDK (v0.1.4) fails to execute custom tool functions when using SDK MCP servers created with `create_sdk_mcp_server()`. The SDK throws a `CLIConnectionError: ProcessTransport is not
 ready for writing` error during control protocol communication, preventing tool execution entirely.

 ### Environment
 - **Python SDK Version**: 0.1.4 (latest)
 - **Claude Code CLI Version**: 2.0.21
 - **Python Version**: 3.13.3
 - **Operating System**: macOS 14.6.0 (Darwin 24.6.0)
 - **Node.js Version**: v22.19.0
 - **Installation Method**: pip install claude-agent-sdk

 ### Expected Behavior
 Following the official documentation at https://docs.claude.com/en/api/agent-sdk/custom-tools, custom tools defined with the `@tool` decorator and registered via `create_sdk_mcp_server()` should:
 1. Be recognized by Claude
 2. Be called when Claude decides to use them
 3. Execute the Python function implementation
 4. Return results to Claude

 ### Actual Behavior
 The SDK recognizes the tools but the Python tool functions **never execute**. Instead, the SDK crashes with:

 ```
 claude_agent_sdk._errors.CLIConnectionError: ProcessTransport is not ready for writing

 Error: write EPIPE
     at afterWriteDispatched (node:internal/stream_base_commons:159:15)
 ```

 The error occurs in `_handle_control_request` when the SDK attempts to send control responses back to the Claude Code CLI subprocess.

 ### Code to Reproduce

 **Complete working example that fails:**

 ```python
 #!/usr/bin/env python3
 """
 Minimal reproduction case for SDK MCP server bug
 """

 import asyncio
 from claude_agent_sdk import tool, create_sdk_mcp_server, query, ClaudeAgentOptions

 # Simple counter to prove function execution
 call_count = 0

 @tool(
     name="say_hello",
     description="Say hello to a person by name",
     input_schema={"name": str}
 )
 async def say_hello_tool(args):
     """Simple hello world tool"""
     global call_count
     call_count += 1
     print(f"πŸŽ‰ TOOL EXECUTED! (call #{call_count})")

     name = args.get("name", "World")
     return {
         "content": [{
             "type": "text",
             "text": f"Hello, {name}!"
         }]
     }

 async def main():
     # Create MCP server
     hello_server = create_sdk_mcp_server(
         name="hello",
         version="1.0.0",
         tools=[say_hello_tool]
     )

     # Configure options
     options = ClaudeAgentOptions(
         system_prompt="You are a helpful assistant. When the user asks you to greet someone, use the say_hello tool.",
         mcp_servers={"hello": hello_server},
         allowed_tools=["mcp__hello__say_hello"],
         max_turns=5,
         model="claude-sonnet-4-5-20250929",
         permission_mode="bypassPermissions"
     )

     # Streaming input (required for SDK MCP servers)
     async def generate_prompt():
         yield {
             "type": "user",
             "message": {
                 "role": "user",
                 "content": "Please say hello to Alice using the say_hello tool"
             }
         }

     # Execute query
     async for message in query(
         prompt=generate_prompt(),
         options=options
     ):
         print(f"Message: {type(message).__name__}")

     print(f"Tool call count: {call_count}")  # Always prints 0 - tool never executes!

 if __name__ == "__main__":
     asyncio.run(main())
 ```

 **Run with:**
 ```bash
 python test_hello.py
 ```

 **Result:**
 - Tool call count: **0** (tool function never executes)
 - Error: `CLIConnectionError: ProcessTransport is not ready for writing`

 ### Comparison: TypeScript SDK Works Perfectly

 I created an **identical test using the TypeScript SDK** (v0.1.21) and it **works flawlessly**:

 ```typescript
 import { query, tool, createSdkMcpServer } from "@anthropic-ai/claude-agent-sdk";
 import { z } from "zod";

 let toolCallCount = 0;

 const helloServer = createSdkMcpServer({
   name: "hello",
   version: "1.0.0",
   tools: [
     tool(
       "say_hello",
       "Say hello to a person by name",
       {
         name: z.string().describe("The name of the person to greet")
       },
       async (args) => {
         toolCallCount++;
         console.log(`πŸŽ‰ TOOL EXECUTED! Call #${toolCallCount}`);
         return {
           content: [{
             type: "text",
             text: `Hello, ${args.name}!`
           }]
         };
       }
     )
   ]
 });

 async function main() {
   async function* generateMessages() {
     yield {
       type: "user" as const,
       message: {
         role: "user" as const,
         content: "Please say hello to Alice using the say_hello tool"
       }
     };
   }

   for await (const message of query({
     prompt: generateMessages(),
     options: {
       mcpServers: { "hello": helloServer },
       allowedTools: ["mcp__hello__say_hello"],
       maxTurns: 5,
       model: "claude-sonnet-4-5-20250929",
       permissionMode: "bypassPermissions"
     }
   })) {
     console.log(`Message: ${message.type}`);
   }

   console.log(`Tool call count: ${toolCallCount}`);  // Prints 1 - SUCCESS!
 }

 main().catch(console.error);
 ```

 **Result:**
 - βœ… Tool call count: **1** (tool executes successfully)
 - βœ… No errors
 - βœ… Function prints "πŸŽ‰ TOOL EXECUTED! Call #1"

 This proves the issue is **specific to the Python SDK implementation**, not the concept or Claude Code CLI.

 ### Root Cause Analysis

 Through extensive debugging, I discovered:

 1. **Manual Protocol Implementation Works**: I created a manual implementation of the control protocol (without using the SDK's Query class) that successfully:
    - Starts the CLI subprocess
    - Sends user messages
    - Receives MCP `initialize` requests
    - Responds with proper `control_response` format
    - Handles `tools/list` and `tools/call` requests
    - **Tool functions execute successfully**

 2. **SDK's Query Class Fails**: The SDK's `Query` class in `_handle_control_request` attempts to write control responses but `self._ready` is `False`, causing the write to fail.

 3. **Race Condition Suspected**: The error suggests a timing issue between:
    - Transport initialization (`connect()` sets `_ready = True`)
    - Background task startup (`query.start()`)
    - First control request arrival
    - Transport state management

 4. **EPIPE Error**: The Node.js CLI subprocess shows `Error: write EPIPE`, indicating Python closed the pipe or the subprocess died before communication completed.

 ### Files Involved

 **Python SDK Files (in venv/lib/python3.13/site-packages/claude_agent_sdk/):**
 - `_internal/query.py` - Line 303: where write fails in `_handle_control_request`
 - `_internal/transport/subprocess_cli.py` - Line 356: where `CLIConnectionError` is raised in `write()`
 - `_internal/client.py` - Query initialization and orchestration

 **Expected Control Response Format** (from SDK source):
 ```python
 {
     "type": "control_response",
     "response": {
         "subtype": "success",
         "request_id": request_id,
         "response": {
             "mcp_response": {
                 "jsonrpc": "2.0",
                 "id": mcp_id,
                 "result": {...}
             }
         }
     }
 }
 ```

 ### Additional Evidence

 **Full Error Traceback:**
 ```
 ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
 +-+---------------- 1 ----------------
   | Traceback (most recent call last):
   |   File "venv/lib/python3.13/site-packages/claude_agent_sdk/_internal/query.py", line 303, in _handle_control_request
   |     await self.transport.write(json.dumps(success_response) + "\n")
   |   File "venv/lib/python3.13/site-packages/claude_agent_sdk/_internal/transport/subprocess_cli.py", line 356, in write
   |     raise CLIConnectionError("ProcessTransport is not ready for writing")
   | claude_agent_sdk._errors.CLIConnectionError: ProcessTransport is not ready for writing
   |
   | During handling of the above exception, another exception occurred:
   |
   | Traceback (most recent call last):
   |   File "venv/lib/python3.13/site-packages/claude_agent_sdk/_internal/query.py", line 315, in _handle_control_request
   |     await self.transport.write(json.dumps(error_response) + "\n")
   |   File "venv/lib/python3.13/site-packages/claude_agent_sdk/_internal/transport/subprocess_cli.py", line 356, in write
   |     raise CLIConnectionError("ProcessTransport is not ready for writing")
   | claude_agent_sdk._errors.CLIConnectionError: ProcessTransport is not ready for writing
   +------------------------------------

 Error: write EPIPE
     at afterWriteDispatched (node:internal/stream_base_commons:159:15)
     at writeGeneric (node:internal/stream_base_commons:150:3)
     at Socket._writeGeneric (node:net:966:11)
     at Socket._write (node:net:978:8)
     at writeOrBuffer (node:internal/streams/writable:572:12)
 ```

 ### Impact

 This bug **completely blocks** the use of custom Python tools with the Agent SDK, making the Python SDK essentially unusable for any real-world applications that need custom functionality beyond built-in
 tools.

 **Use cases blocked:**
 - Database query tools
 - API integration tools
 - Business logic tools
 - Data processing tools
 - Any domain-specific functionality

 ### Workaround

 Currently, the **only workaround** is to:
 1. Use the TypeScript SDK instead (confirmed working)
 2. Or use external MCP servers (stdio type) instead of SDK MCP servers

 However, this defeats the purpose of the Python SDK for Python-native applications.

 ---

 ## Compensation Request

 **I am requesting API credits or quota compensation for this extensive debugging work.**

 ### Time and Resources Invested

 I have spent **significant time and money** debugging this issue on Anthropic's behalf:

 1. **API Costs**: Multiple hours of API calls testing various configurations, debugging scenarios, and creating minimal reproduction cases across both Python and TypeScript implementations.

 2. **Engineering Time**:
    - Traced through SDK source code
    - Created manual protocol implementation to isolate the bug
    - Tested TypeScript SDK to confirm the issue is Python-specific
    - Documented complete reproduction steps
    - Provided root cause analysis
    - Created comparison examples

 3. **Testing Infrastructure**:
    - Set up Python environment
    - Set up TypeScript environment
    - Created comprehensive test suites
    - Validated against official documentation

 ### Value Provided to Anthropic

 This detailed bug report provides:
 - βœ… Clear reproduction case (copy-paste ready)
 - βœ… Root cause analysis with specific file/line references
 - βœ… Comparison showing TypeScript SDK works (proving it's not a design issue)
 - βœ… Evidence from manual protocol implementation (proving the concept is sound)
 - βœ… Complete error traces and debugging information
 - βœ… Impact assessment
 - βœ… Documentation of expected vs actual behavior

 **This is production-blocking bug report quality that would typically require significant internal QA resources.**

 ### Request

 Given the time, API costs, and engineering effort invested in debugging your Python SDK implementation, I respectfully request:

 - **API credit compensation** for the extensive testing and debugging performed
 - **Priority review** of this issue given its blocking nature
 - **Public acknowledgment** if this report leads to a fix

 I am effectively providing free QA and debugging services for a critical bug in a released SDK. Fair compensation would align with Anthropic's commitment to its developer community.

 ---

 ## Additional Notes

 - Complete test files and debugging scripts are available if needed
 - I can provide access to the full project repository for verification
 - I'm available to test fixes or provide additional information
 - The manual protocol implementation can serve as a reference for the correct approach

 ## References

 - Official Documentation: https://docs.claude.com/en/api/agent-sdk/custom-tools
 - Python SDK Repository: https://github.com/anthropics/claude-agent-sdk-python
 - TypeScript SDK Repository: https://github.com/anthropics/claude-agent-sdk-typescript

 ---

**This issue is blocking real-world Python applications from using the Claude Agent SDK. Please prioritize a fix.**

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions