Skip to content

CLI exit errors not propagated to pending control requests - initialize times out instead of failing fast #387

@grumpygordon

Description

@grumpygordon

Bug Description

When the Claude CLI exits with an error (e.g., invalid session ID passed to --resume), the SDK's message reader task catches the error but doesn't signal pending control requests. This causes initialize() to wait for the full 60-second timeout instead of failing immediately.

Steps to Reproduce

  1. Create a ClaudeSDKClient with an invalid/expired session ID via ClaudeAgentOptions(resume="invalid-session-id")
  2. Enter the async context manager (async with client:)
  3. The CLI outputs error to stderr: No conversation found with session ID: xxx
  4. The CLI exits with code 1
  5. Expected: initialize() fails immediately with the error
  6. Actual: initialize() waits 60 seconds before raising Exception: Control request timeout: initialize

Root Cause

In _internal/query.py, the _read_messages method runs in a separate task and catches CLI errors:

# query.py:201-208
except Exception as e:
    logger.error(f"Fatal error in message reader: {e}")
    await self._message_send.send({"type": "error", "error": str(e)})

The error is sent to the message stream, but _send_control_request is waiting on a control response event:

# query.py:355-356
with anyio.fail_after(timeout):
    await event.wait()  # Never signaled when CLI exits with error

These two mechanisms don't communicate - the error in the message reader never wakes up the control request waiter.

Proposed Fix

Signal all pending control requests when an error occurs in _read_messages:

except Exception as e:
    logger.error(f"Fatal error in message reader: {e}")

    # Signal all pending control requests
    for request_id, event in list(self.pending_control_responses.items()):
        if request_id not in self.pending_control_results:
            self.pending_control_results[request_id] = e
            event.set()

    await self._message_send.send({"type": "error", "error": str(e)})

The existing code at lines 361-362 already handles this case:

if isinstance(result, Exception):
    raise result

So the fix just needs to signal the events - the error propagation infrastructure is already in place.

Environment

  • claude-agent-sdk version: 0.1.6
  • Python: 3.12
  • OS: Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions