fix: SendRequest hangs forever when server process dies#714
Conversation
WalkthroughThis PR adds comprehensive test coverage for stdio transport closure behavior and implements transport closure handling in the stdio implementation. Tests validate request unblocking on transport closure, immediate closure checks, context cancellation, concurrent Close operations, and cleanup sequencing. Transport changes include a public ErrTransportClosed error, one-time closure guards, and updates to SendRequest and SendNotification to check closed state before proceeding. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0d33052 to
dd8fd44
Compare
SendRequest's select only watched ctx.Done() and responseChan, missing the done channel. When the server process dies (crash, pipe break, startup failure), the reader goroutine exits but SendRequest blocks forever waiting for a response that will never come. This causes MCP clients to become permanently unresponsive. Root cause: readResponses exited silently on EOF without signaling the done channel, so in-flight requests had no way to know the server died. Fix (5 changes in client/transport/stdio.go): 1. Add closeDone() using sync.Once to safely close the done channel from multiple goroutines without panicking on double-close. 2. readResponses calls closeDone() on unexpected exit (EOF/error), so in-flight requests unblock automatically on server death. 3. SendRequest's select includes <-c.done (both pre-check and response-wait) to return ErrTransportClosed immediately. 4. SendNotification gets matching done+ctx pre-check for consistency. 5. Close() uses a separate closeCleanupOnce to always perform resource cleanup (stdin, stderr, cmd.Wait) even when readResponses already called closeDone(), preventing FD leaks and zombie processes. Includes 8 regression tests covering server death, concurrent close, FD cleanup, and concurrent request stress scenarios.
dd8fd44 to
021cefb
Compare
Description
SendRequesthangs forever when the server process dies (crash, pipe break, startup failure). The reader goroutine exits on EOF but never signals thedonechannel, so in-flight requests block indefinitely waiting for a response that will never come. This causes MCP clients using the stdio transport to become permanently unresponsive.Root cause:
readResponsesexited silently on EOF/error without closing thedonechannel, andSendRequest'sselectonly watchedctx.Done()andresponseChan— it had no way to know the server was gone.Changes (
client/transport/stdio.go)closeDone()withsync.Once— safely closes thedonechannel from multiple goroutines without panicking on double-close.readResponsescallscloseDone()on exit — EOF or read error now immediately unblocks all in-flightSendRequestcalls.SendRequestselects on<-c.done— both in the pre-check and in the response-waitselect, returningErrTransportClosedimmediately when the server dies. DrainsresponseChanfirst to avoid dropping a valid response delivered just before thedonechannel closed.SendNotificationgets matchingdone/ctxpre-check — consistency withSendRequest.Close()usescloseCleanupOnce— always performs resource cleanup (stdin, stderr,cmd.Wait) even whenreadResponsesalready calledcloseDone(), preventing FD leaks and zombie processes. The old early-return guard on<-c.donewould skip cleanup entirely.ErrTransportClosedsentinel — enables callers to useerrors.Is(err, transport.ErrTransportClosed)for proper error handling.Tests (
client/stdio_test.go) — 8 new test functionsTestStdio_SendRequestReturnsWhenTransportClosesTestStdio_SendRequestReturnsImmediatelyWhenAlreadyClosedErrTransportClosedimmediatelyTestStdio_SendNotificationReturnsWhenTransportClosedErrTransportClosedafter Close()TestStdio_SendNotificationReturnsWhenContextCancelledctx.Err()when context cancelledTestStdio_ConcurrentCloseDoesNotPanicsync.Oncesafety)TestStdio_CloseCleanupRunsAfterReadResponsesCloseDoneTestStdio_ConcurrentRequestsAllReceiveResponsesTestStdio_ConcurrentRequestsUnblockOnServerDeathType of Change
Checklist
Additional Information
Reproducing the bug: Any
SendRequestcall through the stdio transport will hang indefinitely if the server process exits before responding. This is easy to trigger with short-lived or crashing servers, and there's no workaround other than using a context with a timeout (which still leaks the goroutine).Backward compatibility: The only new public symbol is
ErrTransportClosed. All behavioral changes are strictly bug fixes — the transport now correctly reports errors instead of silently hanging. Existing callers using context timeouts will see the fasterErrTransportClosederror instead of waiting for the deadline.Summary by CodeRabbit
New Features
ErrTransportClosederror to signal operations on closed transports.Bug Fixes
Tests