Skip to content

SDK MCP tools can fail / “hallucinate” under background subagents due to message-queue backpressure #425

@sorryhyun

Description

@sorryhyun

Note that this was fully written by OpenAI Codex

Summary

When using claude-agent-sdk with SDK MCP servers (mcp_servers={...,"type":"sdk"}), tool calls can become unavailable or fail in scenarios where subagents keep running “in background” after the parent response completes. In the transcript this often appears as the model emitting a plain-text <function_calls><invoke ...></invoke></function_calls> block instead of a real tool_use / tool_result pair, i.e. it behaves as if the MCP tool is missing.

This seems correlated with the SDK’s internal message buffering/backpressure: if the application stops consuming messages after the parent ResultMessage (e.g. uses receive_response() and returns), later streaming output from background subagents can fill the SDK’s internal queue and block the transport reader, which then blocks the control protocol needed for SDK MCP bridging (mcp_message control requests).

Environment

  • claude-agent-sdk: 0.1.17
  • Claude Code CLI: 2.0.70 (from stream-json transcript)
  • Python: 3.12.3
  • mcp: 1.21.1
  • anyio: 4.11.0
  • include_partial_messages: True (in our usage; increases message volume)

What we see in practice

Working (foreground): real tool call:

  • assistant emits a tool_use block: mcp__action_manager__persist_character_design
  • then a tool_result is delivered back

Failing (background): tool call “hallucinated” as plain text:

  • assistant message is just a text block that contains <function_calls><invoke name="mcp__...">...
  • no tool_use / tool_result blocks appear, but the assistant text claims success

This matches the behavior when the model does not actually have the tool schema available or cannot complete the tool call.

Reproduction sketch (minimal)

I don’t have a single deterministic prompt-only repro yet, but the pattern is:

  1. Configure ClaudeSDKClient with an SDK MCP server:
    • mcp_servers={"action_manager": create_sdk_mcp_server(...)}
  2. Ensure the model uses a subagent/background mechanism that can produce output after the parent response ResultMessage (e.g., Task/subagent jobs that continue running while the parent returns).
  3. In application code, send client.query(...) and then consume messages only until the first ResultMessage (e.g., async for m in client.receive_response(): ... which terminates at ResultMessage).
  4. Don’t keep draining client.receive_messages() afterwards.
  5. If the CLI keeps producing additional events/messages (especially with include_partial_messages=True), the SDK’s internal queue can fill, causing backpressure and breaking the ability to service later control requests (including SDK MCP).

Expected behavior

Even if the application uses receive_response() (and therefore stops consuming after the parent ResultMessage), SDK MCP tool availability and tool execution should remain reliable for any background/subagent work that is still ongoing within the same session.

At minimum, the SDK should not deadlock/control-protocol-starve when the app temporarily isn’t consuming transcript messages.

Actual behavior

After the parent response completes, background/subagent work that tries to call SDK MCP tools may:

  • not see tool schemas / not be able to call tools
  • emit a plain-text “function_calls/invoke” block (hallucinated tool call)
  • or otherwise fail to get tool results

Suspected root cause (SDK-side)

In claude_agent_sdk/_internal/query.py:

  • The SDK uses an internal memory stream with a small buffer:
    • anyio.create_memory_object_stream(max_buffer_size=100)
  • _read_messages() forwards all non-control messages into this buffer via:
    • await self._message_send.send(message)

If the application stops consuming messages (e.g., stops after ResultMessage), and the CLI continues emitting messages (common with partial streaming and/or subagent background output), then:

  1. _message_send.send(...) blocks once the buffer reaches capacity.
  2. _read_messages() stops draining stdout from the Claude Code CLI process.
  3. Control protocol messages that arrive later on stdout (including control_request subtype mcp_message used for SDK MCP bridging) are not read/handled promptly.
  4. SDK MCP becomes unreliable, which manifests as missing tool schema or missing tool results.

This is particularly surprising because receive_response() is presented as a convenience API; users may reasonably expect it to be safe in sessions that use SDK MCP servers.

Proposed fixes / improvements

One or more of:

  1. Never block _read_messages() on delivery to the user queue

    • Use send_nowait() / move_on_after(0) for non-control messages.
    • If the queue is full, drop messages (or drop only low-value messages like partial StreamEvent).
    • The priority should be “keep draining CLI stdout + keep servicing control protocol”.
  2. Make the internal message buffer size configurable

    • e.g., ClaudeAgentOptions.max_message_queue (separate from max_buffer_size which currently guards JSON line buffering).
  3. Add an SDK-managed background drain/pump

    • If SDK MCP servers or hooks are configured, keep draining messages after receive_response() returns so the control channel remains healthy.
    • Or provide a documented helper/pattern for this.
  4. Documentation

    • Explicitly warn that if you use SDK MCP servers (or expect background/subagent output), you must continue consuming receive_messages() or you may starve the control channel.

Why this matters

SDK MCP servers are a key feature for “in-process tools”. Background subagents (e.g., Task tool patterns) are also a core workflow. If receive_response() usage can cause hidden backpressure that breaks tool execution, it’s very easy for users to end up with brittle systems and hard-to-debug “hallucinated tool results”.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions