Skip to content

Conversation

@dicksontsai
Copy link
Collaborator

Summary

Fixes a critical deadlock issue that occurs when MCP servers produce verbose stderr output. The SDK would hang indefinitely when the stderr pipe buffer filled up.

The Problem

The deadlock occurred due to sequential reading of subprocess streams:

  1. SDK reads stdout completely before reading stderr
  2. When stderr pipe buffer fills (64KB on Linux, 16KB on macOS), subprocess blocks on write
  3. Subprocess can't continue to stdout, parent waits for stdout → DEADLOCK 🔒

The Solution

Redirect stderr to a temporary file instead of a pipe:

  • No pipe buffer = no possibility of deadlock
  • Temp file can grow as needed (no 64KB limit)
  • Still capture stderr for error reporting (last 100 lines)
  • Works consistently across all async backends

Implementation Details

  • stderr=tempfile.NamedTemporaryFile() instead of stderr=PIPE
  • Use deque(maxlen=100) to keep only recent stderr lines in memory
  • Temp file is automatically cleaned up on disconnect
  • Add [stderr truncated, showing last 100 lines] message when buffer is full

Testing

  • Verified no deadlock with 150+ lines of stderr output
  • Confirmed stderr is still captured for error reporting
  • All existing tests pass
  • Works with asyncio, trio, and other anyio backends

Impact

  • Fixes consistent hangs in production with MCP servers
  • No functional regression - stderr handling is preserved
  • Simpler than concurrent reading alternatives
  • More robust than pipe-based solutions

Fixes the issue reported in Slack where SDK would hang indefinitely when receiving messages from MCP servers with verbose logging.

🤖 Generated with Claude Code

…p file

The SDK was experiencing deadlocks when MCP servers produced verbose stderr
output. The issue occurred because:
1. The SDK read stdout and stderr sequentially (stdout first, then stderr)
2. When stderr buffer filled up (64KB on Linux, 16KB on macOS), the subprocess
   would block trying to write more to stderr
3. Since the subprocess was blocked, it couldn't write to stdout
4. The parent process was waiting for stdout, creating a deadlock

This fix redirects stderr to a temporary file instead of a pipe, which:
- Eliminates the pipe buffer limitation (files have no such restriction)
- Prevents any possibility of deadlock
- Still captures stderr for error reporting (keeping last 100 lines)
- Works consistently across all async backends (asyncio, trio, etc.)

The temp file is automatically cleaned up when the subprocess ends, and we
use a circular buffer (deque) to keep only the last 100 lines in memory for
error reporting purposes.

Fixes deadlock issue reported in Slack where SDK would hang indefinitely
when receiving messages from MCP servers.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@dicksontsai dicksontsai requested a review from blois July 31, 2025 18:03
@dicksontsai dicksontsai merged commit fbda510 into main Jul 31, 2025
6 checks passed
rushilpatel0 pushed a commit to codegen-sh/claude-code-sdk-python that referenced this pull request Aug 17, 2025
…ropics#103)

## Summary

Fixes a critical deadlock issue that occurs when MCP servers produce
verbose stderr output. The SDK would hang indefinitely when the stderr
pipe buffer filled up.

## The Problem

The deadlock occurred due to sequential reading of subprocess streams:
1. SDK reads stdout completely before reading stderr
2. When stderr pipe buffer fills (64KB on Linux, 16KB on macOS),
subprocess blocks on write
3. Subprocess can't continue to stdout, parent waits for stdout →
**DEADLOCK** 🔒

## The Solution

Redirect stderr to a temporary file instead of a pipe:
- **No pipe buffer** = no possibility of deadlock
- Temp file can grow as needed (no 64KB limit)
- Still capture stderr for error reporting (last 100 lines)
- Works consistently across all async backends

## Implementation Details

- `stderr=tempfile.NamedTemporaryFile()` instead of `stderr=PIPE`
- Use `deque(maxlen=100)` to keep only recent stderr lines in memory
- Temp file is automatically cleaned up on disconnect
- Add `[stderr truncated, showing last 100 lines]` message when buffer
is full

## Testing

- Verified no deadlock with 150+ lines of stderr output
- Confirmed stderr is still captured for error reporting
- All existing tests pass
- Works with asyncio, trio, and other anyio backends

## Impact

- Fixes consistent hangs in production with MCP servers
- No functional regression - stderr handling is preserved
- Simpler than concurrent reading alternatives
- More robust than pipe-based solutions

Fixes the issue reported in Slack where SDK would hang indefinitely when
receiving messages from MCP servers with verbose logging.

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <[email protected]>
Signed-off-by: Rushil Patel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants