Skip to content

fix: SendRequest hangs forever when server process dies#714

Merged
ezynda3 merged 1 commit intomark3labs:mainfrom
detailobsessed:02-09-fix_sendrequest_hangs_forever_when_server_process_dies
Feb 15, 2026
Merged

fix: SendRequest hangs forever when server process dies#714
ezynda3 merged 1 commit intomark3labs:mainfrom
detailobsessed:02-09-fix_sendrequest_hangs_forever_when_server_process_dies

Conversation

@ichoosetoaccept
Copy link
Contributor

@ichoosetoaccept ichoosetoaccept commented Feb 10, 2026

Description

SendRequest hangs forever when the server process dies (crash, pipe break, startup failure). The reader goroutine exits on EOF but never signals the done channel, so in-flight requests block indefinitely waiting for a response that will never come. This causes MCP clients using the stdio transport to become permanently unresponsive.

Root cause: readResponses exited silently on EOF/error without closing the done channel, and SendRequest's select only watched ctx.Done() and responseChan — it had no way to know the server was gone.

Changes (client/transport/stdio.go)

  1. closeDone() with sync.Once — safely closes the done channel from multiple goroutines without panicking on double-close.
  2. readResponses calls closeDone() on exit — EOF or read error now immediately unblocks all in-flight SendRequest calls.
  3. SendRequest selects on <-c.done — both in the pre-check and in the response-wait select, returning ErrTransportClosed immediately when the server dies. Drains responseChan first to avoid dropping a valid response delivered just before the done channel closed.
  4. SendNotification gets matching done/ctx pre-check — consistency with SendRequest.
  5. Close() uses closeCleanupOnce — always performs resource cleanup (stdin, stderr, cmd.Wait) even when readResponses already called closeDone(), preventing FD leaks and zombie processes. The old early-return guard on <-c.done would skip cleanup entirely.
  6. Export ErrTransportClosed sentinel — enables callers to use errors.Is(err, transport.ErrTransportClosed) for proper error handling.

Tests (client/stdio_test.go) — 8 new test functions

Test What it verifies
TestStdio_SendRequestReturnsWhenTransportCloses SendRequest unblocks automatically on server death (EOF)
TestStdio_SendRequestReturnsImmediatelyWhenAlreadyClosed Pre-check returns ErrTransportClosed immediately
TestStdio_SendNotificationReturnsWhenTransportClosed SendNotification returns ErrTransportClosed after Close()
TestStdio_SendNotificationReturnsWhenContextCancelled SendNotification returns ctx.Err() when context cancelled
TestStdio_ConcurrentCloseDoesNotPanic 10 concurrent Close() calls don't panic (sync.Once safety)
TestStdio_CloseCleanupRunsAfterReadResponsesCloseDone Close() still cleans up FDs after readResponses already called closeDone()
TestStdio_ConcurrentRequestsAllReceiveResponses 50 parallel requests all get correct responses (response routing stress test)
TestStdio_ConcurrentRequestsUnblockOnServerDeath 20 parallel in-flight requests all unblock on server death

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Checklist

  • My code follows the code style of this project
  • I have performed a self-review of my own code
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the documentation accordingly

Additional Information

Reproducing the bug: Any SendRequest call through the stdio transport will hang indefinitely if the server process exits before responding. This is easy to trigger with short-lived or crashing servers, and there's no workaround other than using a context with a timeout (which still leaks the goroutine).

Backward compatibility: The only new public symbol is ErrTransportClosed. All behavioral changes are strictly bug fixes — the transport now correctly reports errors instead of silently hanging. Existing callers using context timeouts will see the faster ErrTransportClosed error instead of waiting for the deadline.

Summary by CodeRabbit

  • New Features

    • Introduced ErrTransportClosed error to signal operations on closed transports.
  • Bug Fixes

    • Enhanced transport closure handling to safely manage concurrent Close calls and prevent deadlocks.
    • Improved server disconnection handling to properly unblock in-flight requests.
    • Added protection against resource leaks during cleanup.
  • Tests

    • Comprehensive test suite covering transport closure, context cancellation, concurrent scenarios, and server disconnection handling.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

Walkthrough

This PR adds comprehensive test coverage for stdio transport closure behavior and implements transport closure handling in the stdio implementation. Tests validate request unblocking on transport closure, immediate closure checks, context cancellation, concurrent Close operations, and cleanup sequencing. Transport changes include a public ErrTransportClosed error, one-time closure guards, and updates to SendRequest and SendNotification to check closed state before proceeding.

Changes

Cohort / File(s) Summary
Test Suite Additions
client/stdio_test.go
Added 10 new test functions covering transport closure behavior, context cancellation, concurrent scenarios, server death simulation, and constructor validation. Tests use io.Pipe and JSON-RPC framing for in-test server mocking. Updated imports to include io and strings.
Transport Closure & Cleanup
client/transport/stdio.go
Added public ErrTransportClosed error. Introduced closeOnce and closeCleanupOnce guards plus closeDone helper to ensure single-invocation cleanup and in-flight request unblocking. Updated Close to signal requests first, then aggregate cleanup errors. Modified readResponses to call closeDone on server death. Updated SendRequest and SendNotification to immediately return ErrTransportClosed for closed transports and drain potential concurrent responses.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • dugenkui03
  • ezynda3
  • pottekkat
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (5 files):

⚔️ README.md (content)
⚔️ client/stdio_test.go (content)
⚔️ client/transport/sse.go (content)
⚔️ client/transport/sse_test.go (content)
⚔️ client/transport/stdio.go (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the primary bug fix: SendRequest hanging when the server process dies. It is clear, specific, and directly reflects the main change.
Description check ✅ Passed The description is comprehensive and well-structured, covering root cause, detailed changes, test coverage with a clear table, type of change selection, checklist completion, and backward compatibility notes. It provides sufficient context for reviewers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch 02-09-fix_sendrequest_hangs_forever_when_server_process_dies
  • Post resolved changes as copyable diffs in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
client/transport/stdio.go (1)

229-247: Consider errors.Join for aggregating cleanup errors.

Only the first cleanup error is captured; if stdin.Close() succeeds but both stderr.Close() and cmd.Wait() fail, the cmd.Wait() error is silently dropped. Using errors.Join would preserve all errors for debugging.

♻️ Proposed refactor to preserve all cleanup errors
 	var closeErr error
 	c.closeCleanupOnce.Do(func() {
+		var errs []error
 		if c.stdin != nil {
 			if err := c.stdin.Close(); err != nil {
-				closeErr = fmt.Errorf("failed to close stdin: %w", err)
+				errs = append(errs, fmt.Errorf("failed to close stdin: %w", err))
 			}
 		}
 		if c.stderr != nil {
-			if err := c.stderr.Close(); err != nil && closeErr == nil {
-				closeErr = fmt.Errorf("failed to close stderr: %w", err)
+			if err := c.stderr.Close(); err != nil {
+				errs = append(errs, fmt.Errorf("failed to close stderr: %w", err))
 			}
 		}
 		if c.cmd != nil {
-			if err := c.cmd.Wait(); err != nil && closeErr == nil {
-				closeErr = err
+			if err := c.cmd.Wait(); err != nil {
+				errs = append(errs, err)
 			}
 		}
+		closeErr = errors.Join(errs...)
 	})

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

SendRequest's select only watched ctx.Done() and responseChan, missing
the done channel. When the server process dies (crash, pipe break,
startup failure), the reader goroutine exits but SendRequest blocks
forever waiting for a response that will never come. This causes MCP
clients to become permanently unresponsive.

Root cause: readResponses exited silently on EOF without signaling the
done channel, so in-flight requests had no way to know the server died.

Fix (5 changes in client/transport/stdio.go):
1. Add closeDone() using sync.Once to safely close the done channel
   from multiple goroutines without panicking on double-close.
2. readResponses calls closeDone() on unexpected exit (EOF/error),
   so in-flight requests unblock automatically on server death.
3. SendRequest's select includes <-c.done (both pre-check and
   response-wait) to return ErrTransportClosed immediately.
4. SendNotification gets matching done+ctx pre-check for consistency.
5. Close() uses a separate closeCleanupOnce to always perform resource
   cleanup (stdin, stderr, cmd.Wait) even when readResponses already
   called closeDone(), preventing FD leaks and zombie processes.

Includes 8 regression tests covering server death, concurrent close,
FD cleanup, and concurrent request stress scenarios.
@ichoosetoaccept ichoosetoaccept force-pushed the 02-09-fix_sendrequest_hangs_forever_when_server_process_dies branch from dd8fd44 to 021cefb Compare February 14, 2026 12:32
@ezynda3 ezynda3 merged commit 0510f0c into mark3labs:main Feb 15, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants