🤖 fix: prevent mux-server crash on task timeout + undici termination #1566
+158
−23
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When a tool call (notably
task) times out waiting foragent_report, the parent stream can hit an undiciBodyTimeoutErrorsurfaced asTypeError: terminated. This could bubble into an unhandled rejection or uncaught exception, taking down mux-server.Error observed:
Solution
Multi-layered defense to ensure server stays up:
Suppress side-promise unhandled rejections:
streamText()returnsfullStream(consumed in try/catch), plus side promises (usage,steps,providerMetadata, etc.). When aborted/terminated, these can reject without handlers → crash. Now we attach no-op.catch()to each.Treat abort-induced errors as cancellation: If stream was aborted/stopping when an error occurs (like
TypeError: terminated), don't setERRORstate or write error partial - just log at debug level and proceed to cleanup.Safe tool-error formatting: The
tool-errorhandler didJSON.stringify(error)which can throw (circular refs, BigInt). Wrapped in try/catch with fallback.tasktool timeouts → running status: WhenwaitForAgentReporttimes out, return{status:"running"}instead of throwing - consistent withtask_awaitbehavior. This prevents timeouts from becoming tool execution errors.Improve undici error categorization: Detect
TypeError: terminatedandUND_ERR_BODY_TIMEOUTand map to"network"instead of"unknown".Safety net handlers: Add
process.on("uncaughtException"/"unhandledRejection")in CLI server that suppress benign network errors and keep the server running.Testing
make typecheck- passesmake lint- passesmake fmt-check- passesbun test src/cli/server.test.ts- 13 passbun test src/node/services/streamManager.test.ts- 7 pass, 2 skipGenerated with
mux• Model:mux-gateway:anthropic/claude-opus-4-5• Thinking:high• Cost:$3.26