Skip to content

Conversation

@ibetitsmike
Copy link
Contributor

Problem

When a tool call (notably task) times out waiting for agent_report, the parent stream can hit an undici BodyTimeoutError surfaced as TypeError: terminated. This could bubble into an unhandled rejection or uncaught exception, taking down mux-server.

Error observed:

src/node/services/streamManager.ts:1146 Tool execution error for 'task' {
  error: Error: Timed out waiting for agent_report
      at Timeout.<anonymous> (/tmp/mux/npm/node_modules/mux/src/node/services/taskService.ts:854:18)
}

TypeError: terminated
    at Fetch.onAborted (node:internal/deps/undici/undici:11132:53)
    ...
  [cause]: BodyTimeoutError: Body Timeout Error { code: 'UND_ERR_BODY_TIMEOUT' }

Solution

Multi-layered defense to ensure server stays up:

  1. Suppress side-promise unhandled rejections: streamText() returns fullStream (consumed in try/catch), plus side promises (usage, steps, providerMetadata, etc.). When aborted/terminated, these can reject without handlers → crash. Now we attach no-op .catch() to each.

  2. Treat abort-induced errors as cancellation: If stream was aborted/stopping when an error occurs (like TypeError: terminated), don't set ERROR state or write error partial - just log at debug level and proceed to cleanup.

  3. Safe tool-error formatting: The tool-error handler did JSON.stringify(error) which can throw (circular refs, BigInt). Wrapped in try/catch with fallback.

  4. task tool timeouts → running status: When waitForAgentReport times out, return {status:"running"} instead of throwing - consistent with task_await behavior. This prevents timeouts from becoming tool execution errors.

  5. Improve undici error categorization: Detect TypeError: terminated and UND_ERR_BODY_TIMEOUT and map to "network" instead of "unknown".

  6. Safety net handlers: Add process.on("uncaughtException"/"unhandledRejection") in CLI server that suppress benign network errors and keep the server running.

Testing

  • make typecheck - passes
  • make lint - passes
  • make fmt-check - passes
  • bun test src/cli/server.test.ts - 13 pass
  • bun test src/node/services/streamManager.test.ts - 7 pass, 2 skip

Generated with mux • Model: mux-gateway:anthropic/claude-opus-4-5 • Thinking: high • Cost: $3.26

When a tool call (notably `task`) times out waiting for `agent_report`,
the parent stream can hit an undici `BodyTimeoutError` surfaced as
`TypeError: terminated`. This could bubble into an unhandled rejection
or uncaught exception, taking down mux-server.

Fixes:
1. Attach no-op .catch() handlers to streamText() side promises (usage,
   steps, providerMetadata, etc.) to prevent unhandled rejections when
   stream is aborted/terminated
2. Treat abort-induced errors as cancellation, not stream errors - if
   stream was aborted/stopping, skip error state + error partial write
3. Make tool-error formatting non-throwing with safe JSON.stringify
4. Make `task` tool timeouts return {status:"running"} instead of
   throwing (consistent with task_await behavior)
5. Improve error categorization for undici termination/timeouts to map
   to "network" instead of "unknown"
6. Add safety net process.on("uncaughtException"/"unhandledRejection")
   handlers in CLI server entrypoint that suppress benign network errors
   and keep the server running
@ibetitsmike ibetitsmike force-pushed the mike/fix-server-crash-on-task-timeout branch from 8792af3 to e1b6555 Compare January 10, 2026 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant