-
Notifications
You must be signed in to change notification settings - Fork 503
Description
Note that this was fully written by OpenAI Codex
Summary
When using claude-agent-sdk with SDK MCP servers (mcp_servers={...,"type":"sdk"}), tool calls can become unavailable or fail in scenarios where subagents keep running “in background” after the parent response completes. In the transcript this often appears as the model emitting a plain-text <function_calls><invoke ...></invoke></function_calls> block instead of a real tool_use / tool_result pair, i.e. it behaves as if the MCP tool is missing.
This seems correlated with the SDK’s internal message buffering/backpressure: if the application stops consuming messages after the parent ResultMessage (e.g. uses receive_response() and returns), later streaming output from background subagents can fill the SDK’s internal queue and block the transport reader, which then blocks the control protocol needed for SDK MCP bridging (mcp_message control requests).
Environment
claude-agent-sdk:0.1.17- Claude Code CLI:
2.0.70(from stream-json transcript) - Python:
3.12.3 mcp:1.21.1anyio:4.11.0include_partial_messages:True(in our usage; increases message volume)
What we see in practice
Working (foreground): real tool call:
- assistant emits a
tool_useblock:mcp__action_manager__persist_character_design - then a
tool_resultis delivered back
Failing (background): tool call “hallucinated” as plain text:
- assistant message is just a
textblock that contains<function_calls><invoke name="mcp__...">... - no
tool_use/tool_resultblocks appear, but the assistant text claims success
This matches the behavior when the model does not actually have the tool schema available or cannot complete the tool call.
Reproduction sketch (minimal)
I don’t have a single deterministic prompt-only repro yet, but the pattern is:
- Configure
ClaudeSDKClientwith an SDK MCP server:mcp_servers={"action_manager": create_sdk_mcp_server(...)}
- Ensure the model uses a subagent/background mechanism that can produce output after the parent response
ResultMessage(e.g., Task/subagent jobs that continue running while the parent returns). - In application code, send
client.query(...)and then consume messages only until the firstResultMessage(e.g.,async for m in client.receive_response(): ...which terminates atResultMessage). - Don’t keep draining
client.receive_messages()afterwards. - If the CLI keeps producing additional events/messages (especially with
include_partial_messages=True), the SDK’s internal queue can fill, causing backpressure and breaking the ability to service later control requests (including SDK MCP).
Expected behavior
Even if the application uses receive_response() (and therefore stops consuming after the parent ResultMessage), SDK MCP tool availability and tool execution should remain reliable for any background/subagent work that is still ongoing within the same session.
At minimum, the SDK should not deadlock/control-protocol-starve when the app temporarily isn’t consuming transcript messages.
Actual behavior
After the parent response completes, background/subagent work that tries to call SDK MCP tools may:
- not see tool schemas / not be able to call tools
- emit a plain-text “function_calls/invoke” block (hallucinated tool call)
- or otherwise fail to get tool results
Suspected root cause (SDK-side)
In claude_agent_sdk/_internal/query.py:
- The SDK uses an internal memory stream with a small buffer:
anyio.create_memory_object_stream(max_buffer_size=100)
_read_messages()forwards all non-control messages into this buffer via:await self._message_send.send(message)
If the application stops consuming messages (e.g., stops after ResultMessage), and the CLI continues emitting messages (common with partial streaming and/or subagent background output), then:
_message_send.send(...)blocks once the buffer reaches capacity._read_messages()stops draining stdout from the Claude Code CLI process.- Control protocol messages that arrive later on stdout (including
control_requestsubtypemcp_messageused for SDK MCP bridging) are not read/handled promptly. - SDK MCP becomes unreliable, which manifests as missing tool schema or missing tool results.
This is particularly surprising because receive_response() is presented as a convenience API; users may reasonably expect it to be safe in sessions that use SDK MCP servers.
Proposed fixes / improvements
One or more of:
-
Never block
_read_messages()on delivery to the user queue- Use
send_nowait()/move_on_after(0)for non-control messages. - If the queue is full, drop messages (or drop only low-value messages like partial
StreamEvent). - The priority should be “keep draining CLI stdout + keep servicing control protocol”.
- Use
-
Make the internal message buffer size configurable
- e.g.,
ClaudeAgentOptions.max_message_queue(separate frommax_buffer_sizewhich currently guards JSON line buffering).
- e.g.,
-
Add an SDK-managed background drain/pump
- If SDK MCP servers or hooks are configured, keep draining messages after
receive_response()returns so the control channel remains healthy. - Or provide a documented helper/pattern for this.
- If SDK MCP servers or hooks are configured, keep draining messages after
-
Documentation
- Explicitly warn that if you use SDK MCP servers (or expect background/subagent output), you must continue consuming
receive_messages()or you may starve the control channel.
- Explicitly warn that if you use SDK MCP servers (or expect background/subagent output), you must continue consuming
Why this matters
SDK MCP servers are a key feature for “in-process tools”. Background subagents (e.g., Task tool patterns) are also a core workflow. If receive_response() usage can cause hidden backpressure that breaks tool execution, it’s very easy for users to end up with brittle systems and hard-to-debug “hallucinated tool results”.