Skip to content

fix: use unbounded message buffer to prevent deadlock on multi-turn queries (#558)#572

Open
naga-k wants to merge 2 commits intoanthropics:mainfrom
naga-k:fix/558-message-buffer-deadlock
Open

fix: use unbounded message buffer to prevent deadlock on multi-turn queries (#558)#572
naga-k wants to merge 2 commits intoanthropics:mainfrom
naga-k:fix/558-message-buffer-deadlock

Conversation

@naga-k
Copy link

@naga-k naga-k commented Feb 14, 2026

Problem

When using ClaudeSDKClient for multi-turn conversations, the second query() call hangs indefinitely after the first query involves subagent Task invocations.

Root Cause

The _read_messages() loop in Query handles both control protocol routing (init acknowledgments, tool result requests) and regular message buffering in a single async loop. The message buffer is bounded at 100 slots via anyio.create_memory_object_stream(max_buffer_size=100).

When receive_response() stops consuming at a ResultMessage, unconsumed messages (e.g., task_notification from subagent completion) remain in the buffer. If enough accumulate to fill the 100-slot buffer, _read_messages() blocks on await self._message_send.send(message).

This creates a deadlock chain:

  1. _read_messages() blocked on buffer send → can't read stdout
  2. CLI subprocess stdout pipe fills → CLI blocks on write
  3. CLI can't process stdin → next query never processed
  4. receive_response() hangs forever

Fix

Change max_buffer_size=100 to max_buffer_size=math.inf (unbounded). This is safe because:

  • anyio.create_memory_object_stream explicitly supports math.inf for unbounded buffers
  • Messages are still consumed by receive_response(), so memory stays bounded in practice
  • The unbounded buffer just prevents the routing loop from stalling
  • No behavioral change for normal single-turn usage

Changes

  • 14 lines of production code (query.py): import math + unbounded buffer + explanatory comment
  • ~200 lines of tests (test_message_buffer_deadlock.py): 3 tests that reproduce the deadlock with the old bounded buffer and verify the fix resolves it

Testing

  • All 159 tests pass (156 existing + 3 new)
  • test_bounded_buffer_blocks_control_message_routing — patches Query with buffer=100, sends 110 messages with no consumer, proves control request times out (deadlock reproduced)
  • test_unbounded_buffer_allows_control_message_routing — same scenario with math.inf, control request succeeds (fix verified)
  • test_query_class_uses_unbounded_buffer — asserts the config is math.inf
  • Verified locally with real CLI: multi-turn conversations with tool use complete without hanging

Fixes #558

…ueries

The _read_messages() loop handles both control protocol routing and
regular message buffering in a single async loop. With a bounded buffer
(max_buffer_size=100), when receive_response() stops consuming at a
ResultMessage, the buffer fills up and _read_messages() blocks on
send(). This prevents it from reading any transport data — including
control messages the CLI subprocess needs answered to process new
queries. The result is a deadlock where subsequent queries hang
indefinitely.

Fix: use math.inf for an unbounded buffer. Messages are still consumed
by receive_response(), so memory stays bounded in practice. The
unbounded buffer just prevents the routing loop from stalling.

Fixes anthropics#558
@naga-k naga-k marked this pull request as ready for review February 14, 2026 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Second query hangs after first query invokes background Task

1 participant