-
Notifications
You must be signed in to change notification settings - Fork 302
Description
Problem
The StreamableClientTransport permanently breaks after any transient error, making the connection unusable for all subsequent requests. This affects both stateful and stateless server modes.
Transient errors include:
- Network timeouts (context deadline exceeded)
- HTTP errors (503 Service Unavailable, 500 Internal Server Error, 502 Bad Gateway)
- Network interruptions (connection refused, connection reset)
After a single transient error occurs, all subsequent requests fail with "client is closing", requiring a full reconnection. This affects production systems where temporary issues (server restarts, network glitches, load spikes) should be recoverable.
1. What did you do?
Created an MCP client using StreamableClientTransport and made three sequential tool calls, where the second call encounters a transient error (timeout, 503, network interruption, etc.):
httpClient := &http.Client{Timeout: /* some timeout */}
transport := &mcp.StreamableClientTransport{
Endpoint: serverURL,
HTTPClient: httpClient,
}
session, _ := client.Connect(ctx, transport, nil)
// Call 1: Fast tool - should succeed
result1, err1 := session.CallTool(ctx, &mcp.CallToolParams{Name: "delay_tool"})
// Call 2: Encounters transient error (timeout, 503, etc.)
result2, err2 := session.CallTool(ctx, &mcp.CallToolParams{Name: "delay_tool"})
// Call 3: Fast tool - should succeed
result3, err3 := session.CallTool(ctx, &mcp.CallToolParams{Name: "delay_tool"})2. What did you see?
Call #1: SUCCESS
Call #2: FAILED - calling "tools/call": sending "tools/call": Post "http://...": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Call #3: FAILED - connection closed: calling "tools/call": client is closing: EOF
Call #4+: FAILED - connection closed: calling "tools/call": client is closing: EOF
After Call #2 encounters a transient error, all subsequent calls fail permanently with "client is closing: EOF" errors. The connection becomes unusable and must be recreated.
3. What did you expect to see?
Call #1: SUCCESS ✅
Call #2: FAILED - transient error (expected)
Call #3: SUCCESS ✅ (connection should survive the transient error)
Call #4+: SUCCESS ✅
The connection should remain healthy after transient errors. Only fatal errors (authentication failures, protocol errors, session termination) should require reconnection.
4. What version of the Go MCP SDK are you using?
Bug reproduced in:
- main branch (commit 272e0cd)
- v1.1.0 (latest release)
- v1.0.0 (initial release)
The bug exists in all released versions of the SDK.
5. What version of Go are you using (go version)?
go version go1.23.3 darwin/arm64
Root Cause
The streamableClientConn.Write() method returns transient errors without wrapping them in jsonrpc2.ErrRejected. This causes the underlying jsonrpc2.Connection to set its writeErr flag permanently (in internal/jsonrpc2/conn.go:794-810), which is never cleared and blocks all future operations.
The jsonrpc2 package already has a mechanism to handle recoverable errors via ErrRejected (see conn.go:788-792), but the streamable transport doesn't use it.
After any transient error, the connection becomes unusable for all MCP operations: CallTool(), ListTools(), ListResources(), etc. The entire session must be recreated, even though the error was temporary and should have been recoverable.
Related Issues
- go-sdk#479 - Server-side cleanup callbacks for closed connections
- modelcontextprotocol/modelcontextprotocol#1442 - SEP proposal to make MCP stateless by default
Reproduction Test Code
Below is a minimal reproduction test:
Click to expand test code
package gosdk
import (
"context"
"fmt"
"net/http"
"net/http/httptest"
"sync/atomic"
"testing"
"time"
"github.com/modelcontextprotocol/go-sdk/mcp"
)
// testDelays defines the sleep duration for each call
var testDelays = []time.Duration{
500 * time.Millisecond, // Call 1: fast, should succeed
3 * time.Second, // Call 2: slow, will timeout
500 * time.Millisecond, // Call 3: fast, should succeed but fails due to bug
}
// createDelayTool creates an MCP tool that sleeps for configurable durations
func createDelayTool(callCount *atomic.Int32) (*mcp.Server, *mcp.Tool) {
server := mcp.NewServer(&mcp.Implementation{
Name: "test-server",
Version: "1.0.0",
}, nil)
tool := &mcp.Tool{
Name: "delay_tool",
Description: "Tool with configurable delays for testing",
}
handler := func(ctx context.Context, req *mcp.CallToolRequest, args any) (*mcp.CallToolResult, any, error) {
callNum := int(callCount.Add(1))
delay := testDelays[0] // default
if callNum <= len(testDelays) {
delay = testDelays[callNum-1]
}
time.Sleep(delay)
return &mcp.CallToolResult{
Content: []mcp.Content{
&mcp.TextContent{
Text: fmt.Sprintf("Call #%d completed", callNum),
},
},
}, nil, nil
}
mcp.AddTool(server, tool, handler)
return server, tool
}
// setupTestServer creates and starts an HTTP test server with MCP handler
func setupTestServer(t *testing.T, server *mcp.Server, stateless bool) *httptest.Server {
var opts *mcp.StreamableHTTPOptions
if stateless {
opts = &mcp.StreamableHTTPOptions{Stateless: true}
}
handler := mcp.NewStreamableHTTPHandler(func(req *http.Request) *mcp.Server {
return server
}, opts)
httpServer := httptest.NewServer(handler)
t.Cleanup(httpServer.Close)
return httpServer
}
// createMCPSession creates and connects an MCP client session
func createMCPSession(t *testing.T, serverURL string, clientTimeout time.Duration) *mcp.ClientSession {
ctx := context.Background()
httpClient := &http.Client{Timeout: clientTimeout}
client := mcp.NewClient(&mcp.Implementation{
Name: "test-client",
Version: "1.0.0",
}, nil)
transport := &mcp.StreamableClientTransport{
Endpoint: serverURL,
HTTPClient: httpClient,
}
session, err := client.Connect(ctx, transport, nil)
if err != nil {
t.Fatalf("Failed to connect: %v", err)
}
t.Cleanup(func() { session.Close() })
return session
}
// callResult represents the result of a CallTool invocation
type callResult struct {
num int
err error
result *mcp.CallToolResult
}
// performCallSequence executes three sequential tool calls and returns results
func performCallSequence(session *mcp.ClientSession) []callResult {
results := make([]callResult, 3)
for i := range 3 {
ctx := context.Background() // Fresh context for each call
result, err := session.CallTool(ctx, &mcp.CallToolParams{
Name: "delay_tool",
Arguments: map[string]any{},
})
results[i] = callResult{num: i + 1, err: err, result: result}
}
return results
}
// reportResults logs the test results
func reportResults(t *testing.T, results []callResult) {
t.Helper()
for _, r := range results {
if r.err != nil {
t.Logf("Call #%d: FAILED - %v", r.num, r.err)
} else {
t.Logf("Call #%d: SUCCESS", r.num)
}
}
// Expected behavior: Call 3 should succeed even after Call 2 timeout
if results[2].err != nil {
t.Errorf("Call #3 failed after transient error in Call #2")
}
}
// TestTimeoutBugReproduction tests connection behavior after timeout in stateful mode.
// Expected: Call #3 should succeed even after Call #2 timeout.
func TestTimeoutBugReproduction(t *testing.T) {
var callCount atomic.Int32
server, _ := createDelayTool(&callCount)
httpServer := setupTestServer(t, server, false) // stateful mode
session := createMCPSession(t, httpServer.URL, 2*time.Second)
results := performCallSequence(session)
reportResults(t, results)
}
// TestTimeoutStateless tests connection behavior after timeout in stateless mode.
// Expected: Call #3 should succeed even after Call #2 timeout.
func TestTimeoutStateless(t *testing.T) {
var callCount atomic.Int32
server, _ := createDelayTool(&callCount)
httpServer := setupTestServer(t, server, true) // stateless mode
session := createMCPSession(t, httpServer.URL, 2*time.Second)
results := performCallSequence(session)
reportResults(t, results)
}