Skip to content

feat(cli): exit-code map + structured JSON error envelopes#1583

Merged
cliffhall merged 1 commit into
1579-wave-1from
v2/1564-cli-exit-codes
Jul 2, 2026
Merged

feat(cli): exit-code map + structured JSON error envelopes#1583
cliffhall merged 1 commit into
1579-wave-1from
v2/1564-cli-exit-codes

Conversation

@cliffhall

Copy link
Copy Markdown
Member

Closes #1564

Summary

Wave 1 of the PR #1510 decomposition (tracking issue #1579). Gives the CLI a stable, machine-readable failure surface so a programmatic driver (CI, scripts, agents) can branch on why the CLI failed without regex-scraping prose stderr.

  • Exit-code map (EXIT_CODES): 0 ok · 1 usage · 2 no-app · 3 auth-required · 4 unreachable · 5 tool-error.
  • ErrorEnvelope interface — one JSON line on stderr: {"error":{code,message,cause?,status?,url?}}.
  • CliExitCodeError — thrown by the CLI to request a specific exit code up front, carrying a partial envelope that classifyError() fills in.
  • Pure classifyError() — heuristics over HTTP status (.status / .response.status / numeric SDK .code), an Error.cause chain, and message/cause patterns (401/403/WWW-Authenticate/OAuth → auth; ENOTFOUND/ECONNREFUSED/fetch failed/… → unreachable). Unit-testable in isolation.
  • formatErrorOutput() / handleError() — single source of truth for {exitCode, stderr}; the binary sink and the in-process test runner both call it, so tests observe exactly what the real binary emits.
  • cli-runner test helper now mirrors the binary via formatErrorOutput (dropping the redundant console.error/console.warn overrides, since those route through process.stderr.write).
  • Docs: exit-code map + envelope shape documented in clients/cli/README.md.

Kept well-typed and dependency-free for reuse — this is a Wave 3 dependency and the foundation for the CLI lane (--app-info → exit 2, stored-auth → exit 3, --format json).

Testing

  • clients/cli npm run validate — 127 tests pass (format:check + lint + build + test).
  • clients/cli npm run test:coverage — gate passes; error-handler.ts 100% stmts/funcs/lines, 94.44% branches (all ≥ 90).

🤖 Generated with Claude Code

https://claude.ai/code/session_01S3fTN8H3R8YV4yUGvZjYnX

@cliffhall cliffhall added the v2 Issues and PRs for v2 label Jul 1, 2026
@cliffhall cliffhall linked an issue Jul 1, 2026 that may be closed by this pull request
@cliffhall

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cliffhall's task in 2m 37s —— View job


Review: CLI exit-code map + structured JSON error envelopes

  • Read changed files & diff against origin/v2/main
  • Review error-handler.ts (exit codes, classifyError, envelopes)
  • Review tests & cli-runner helper
  • Review README docs
  • Post review feedback

Nice, focused Wave-1 foundation. The classifyErrorformatErrorOutputhandleError layering is clean, the "single source of truth" that both the binary sink and cli-runner share is exactly right, and the test coverage of the classification heuristics is thorough. A few things worth addressing before/after merge:

🟠 Main concern: numeric .code is not always an HTTP status

// error-handler.ts:74-76
// SDK SSEError exposes `.code` as the HTTP status; numeric only ...
if (typeof e.code === "number") return e.code;

This is correct for SseError / StreamableHTTPError (whose .code is the HTTP status), but the MCP SDK's McpError also carries a numeric .code that is a JSON-RPC error code, not an HTTP status — e.g. MethodNotFound = -32601, InvalidParams = -32602, InternalError = -32603, or arbitrary server-defined integers. Since tool/method failures are exactly what this CLI surfaces, an McpError will flow through here and:

  • leak a nonsensical status: -32601 into the envelope (violating the documented contract that status is "HTTP status when the failure was an HTTP-level response"), and
  • in the (rarer) case where a server-defined JSON-RPC code lands on 401/403, get misclassified as AUTH_REQUIRED.

Suggested fix — constrain the numeric-code path to a plausible HTTP range, which preserves the SSE/HTTP intent while excluding JSON-RPC codes:

// SDK SSEError/StreamableHTTPError expose `.code` as the HTTP status. Guard to
// the HTTP range so a JSON-RPC McpError code (e.g. -32601) isn't read as one.
if (typeof e.code === "number" && e.code >= 100 && e.code <= 599) return e.code;

Fix this → — worth adding a test with a { code: -32601 } error asserting status is undefined and exit is USAGE.

🟡 Minor notes

  1. aborted in UNREACHABLE_PATTERN (line 81) is broad — an AbortSignal-initiated cancel/timeout ("The operation was aborted") would be reported as unreachable, conflating "I gave up" with "network down". Consider tightening to network-abort phrasing, or dropping aborted and relying on the more specific tokens. (The in-process runner's own timeout message doesn't contain "aborted", so this only bites real AbortSignal paths.)

  2. Auth message regex (line 125) matches OAuth / Unauthorized as bare substrings anywhere in message + cause. A generic error like "Failed to parse OAuth config" would be classified AUTH_REQUIRED. Probably acceptable given the intent, but flagging the over-match.

  3. causeOf recursion (lines 53-62) has no cycle guard. A self-referential error.cause chain would recurse infinitely. Extremely rare, but a WeakSet/depth cap would make it bulletproof.

  4. Double stderr output on usage errors (not a regression): commander (with .exitOverride) still writes its human-readable usage message to stderr before throwing, and handleError then appends the JSON envelope — so a bad-arg invocation emits two stderr lines. This pre-dates the PR (the old handleError echoed the message too), but now that stderr is a documented 2>&1 | tail -1 | jq .error surface, the trailing JSON line being the parseable one happens to work out. Might be worth a follow-up to route commander's own output through the envelope for full consistency.

✅ Looks good

  • EXIT_CODES.NO_APP (2) and TOOL_ERROR (5) are defined but not yet emitted — intentional per the PR (foundation for later waves); noting so it isn't mistaken for dead code.
  • Conditional-spread envelope construction correctly omits absent fields rather than emitting undefined.
  • Dropping the redundant console.error/console.warn overrides in cli-runner (since they route through process.stderr.write) is a good simplification, and having the helper mirror the binary via formatErrorOutput means tests observe exactly what ships.
  • README exit-code table + envelope example are clear and match the implementation.

Overall: solid, well-tested, and appropriately dependency-free. The numeric-.code guard is the one I'd want fixed since the CLI's whole job is surfacing tool/method (McpError) failures.
· v2/1564-cli-exit-codes

cliffhall added a commit that referenced this pull request Jul 1, 2026
Address @claude review of #1583:

- statusOf() now only treats a numeric `.code` as an HTTP status when it
  is in the 100-599 range, so an MCP SDK McpError JSON-RPC code (e.g.
  -32601 MethodNotFound) is no longer leaked into the envelope `status`
  or misclassified as AUTH_REQUIRED (it falls back to USAGE).
- causeOf() now carries a depth cap (MAX_CAUSE_DEPTH) so a cyclic/
  self-referential `error.cause` chain terminates instead of recursing
  infinitely.
- Add tests for both: a -32601 McpError-shaped error (status undefined,
  exit USAGE) and a self-referential cause chain (finite string).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01S3fTN8H3R8YV4yUGvZjYnX
@cliffhall

Copy link
Copy Markdown
Member Author

Thanks for the review — addressed in fa5514a.

🟠 Main concern — numeric .code is not always an HTTP status: fixed. statusOf now only accepts a numeric .code when it falls in the HTTP range (>= 100 && <= 599), so an MCP SDK McpError JSON-RPC code (e.g. -32601 MethodNotFound) no longer leaks into status nor gets misclassified as AUTH_REQUIRED — it falls through to USAGE. Added a test with a { code: -32601 } error asserting status is undefined and exit is USAGE.

🟡 Note 3 — causeOf recursion cycle guard: fixed. Added a MAX_CAUSE_DEPTH cap so a self-referential error.cause chain terminates with a finite string instead of recursing infinitely. Added a cyclic-cause test.

🟡 Note 1 — aborted in UNREACHABLE_PATTERN: keeping as-is. The CLI has no user-initiated cancel path that surfaces through the error envelope — Ctrl-C is delivered as SIGINT and terminates the process rather than throwing. The only aborted errors that reach classifyError come from transport connect/timeout paths, which are legitimately UNREACHABLE per the documented "timeout" class, so the token stays.

🟡 Note 2 — auth-message regex over-match: keeping as-is. Agreed it's a broad substring match, but as you noted it's acceptable given the intent: for an auth-focused CLI, biasing an ambiguous OAuth/Unauthorized message toward AUTH_REQUIRED is the safer default (the caller can then kick the auth flow).

🟡 Note 4 — double stderr on usage errors: out of scope here. This pre-dates the PR (commander writes its own usage text before throwing) and isn't a regression; the trailing JSON line remains the parseable one for 2>&1 | tail -1 | jq .error. Better handled as a separate follow-up that routes commander's own output through the envelope, rather than expanding this foundation PR.

@cliffhall

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cliffhall's task in 58s —— View job


Re-review: fixes in fa5514a3

  • Read error-handler.ts (guard fix + cause-depth cap)
  • Review tests & cli-runner helper
  • Review README
  • Post review

Both fixes from the prior review landed cleanly, and the deferred items have reasonable justifications. ✅ Looks good to merge.

✅ Verified

🟠 numeric .code guard — fixed correctly. statusOf now gates the numeric branch to the HTTP range:

// error-handler.ts:84-86
if (typeof e.code === "number" && e.code >= 100 && e.code <= 599) {
  return e.code;
}

A JSON-RPC McpError code like -32601 falls through → status stays undefined and it lands on USAGE instead of being misread as an HTTP status or misclassified as AUTH_REQUIRED. The comment explains the intent well, and there are tests for both directions: the -32601 McpError case (error-handler.test.ts:196) and the still-valid 403 numeric-code case (:189), plus the string-ENOTFOUND code exclusion (:206). Good coverage of the boundary.

🟡 Note 3 — cause-chain cycle guard — fixed correctly. MAX_CAUSE_DEPTH = 16 bounds causeOf (:57-67); a self-referential error.cause = error now terminates with a finite string rather than recursing infinitely, and the cyclic-cause test (:162) asserts exactly that. The depth >= cap → undefined short-circuit collapses to the innermost cause.message, so the string stays sensible.

🟡 Notes 1, 2, 4 — deferred, reasonable. The aborted token (no user-cancel path reaches the envelope; only transport timeout/connect does), the auth-substring bias (safer default for an auth-focused CLI), and the double-stderr commander output (pre-existing, trailing JSON line is still the parseable one) are all justified as-is or as a follow-up. Agreed on all three — none block this foundation PR.

Minor observations (non-blocking)

  • cli-runner timeout path leaks the classified error's exit code, not the timeout as a usage error — when the 10s timeout wins the Promise.race, the rejection is a plain Error("CLI command timed out…"), which formatErrorOutput classifies to USAGE (1). That's fine and intended; just noting the timeout message deliberately avoids the aborted/ETIMEDOUT tokens so it doesn't get miscategorized as UNREACHABLE — the comment at cli-runner.ts:108-113 and Note 1's reasoning line up. No change needed.
  • HTTP range 100–599 is the right envelope for real status codes; 1xx/5xx are unusual as thrown-error statuses but harmless to accept. Fine.

Overall: the one substantive concern (McpError .code) is now correct and tested, the cycle guard is a nice bit of defensive hardening, and the deferrals are well-reasoned. Nothing left blocking from my side. 🎉
· v2/1564-cli-exit-codes

@cliffhall

Copy link
Copy Markdown
Member Author

📋 Smoke test & AGENTS.md audit for this PR: #1564 (comment) (posted on the linked issue #1564).

@cliffhall cliffhall changed the base branch from v2/main to 1579-wave-1 July 2, 2026 16:42
@cliffhall cliffhall merged commit 0fe9c4e into 1579-wave-1 Jul 2, 2026
1 check passed
@cliffhall cliffhall deleted the v2/1564-cli-exit-codes branch July 2, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v2 Issues and PRs for v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli: exit-code map + structured JSON error envelopes

1 participant