StreamableClientTransport: Connection poisoned by transient errors

## Problem

The `StreamableClientTransport` permanently breaks after any transient error, making the connection unusable for all subsequent requests. **This affects both stateful and stateless server modes.**

Transient errors include:
- Network timeouts (context deadline exceeded)
- HTTP errors (503 Service Unavailable, 500 Internal Server Error, 502 Bad Gateway)
- Network interruptions (connection refused, connection reset)

After a single transient error occurs, **all subsequent requests fail** with "client is closing", requiring a full reconnection. This affects production systems where temporary issues (server restarts, network glitches, load spikes) should be recoverable.

---

## 1. What did you do?

Created an MCP client using `StreamableClientTransport` and made three sequential tool calls, where the second call encounters a transient error (timeout, 503, network interruption, etc.):

```go
httpClient := &http.Client{Timeout: /* some timeout */}
transport := &mcp.StreamableClientTransport{
    Endpoint:   serverURL,
    HTTPClient: httpClient,
}

session, _ := client.Connect(ctx, transport, nil)

// Call 1: Fast tool - should succeed
result1, err1 := session.CallTool(ctx, &mcp.CallToolParams{Name: "delay_tool"})

// Call 2: Encounters transient error (timeout, 503, etc.)
result2, err2 := session.CallTool(ctx, &mcp.CallToolParams{Name: "delay_tool"})

// Call 3: Fast tool - should succeed
result3, err3 := session.CallTool(ctx, &mcp.CallToolParams{Name: "delay_tool"})
```

## 2. What did you see?

```
Call #1: SUCCESS
Call #2: FAILED - calling "tools/call": sending "tools/call": Post "http://...": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Call #3: FAILED - connection closed: calling "tools/call": client is closing: EOF
Call #4+: FAILED - connection closed: calling "tools/call": client is closing: EOF
```

After `Call #2` encounters a transient error, all subsequent calls fail permanently with "client is closing: EOF" errors. The connection becomes unusable and must be recreated.

## 3. What did you expect to see?

```
Call #1: SUCCESS ✅
Call #2: FAILED - transient error (expected)
Call #3: SUCCESS ✅ (connection should survive the transient error)
Call #4+: SUCCESS ✅
```

The connection should remain healthy after transient errors. Only fatal errors (authentication failures, protocol errors, session termination) should require reconnection.

## 4. What version of the Go MCP SDK are you using?

**Bug reproduced in:**
- **main branch** (commit 272e0cd)
- **v1.1.0** (latest release)
- **v1.0.0** (initial release)

The bug exists in all released versions of the SDK.

## 5. What version of Go are you using (`go version`)?

```
go version go1.23.3 darwin/arm64
```

---

## Root Cause

The `streamableClientConn.Write()` method returns transient errors without wrapping them in `jsonrpc2.ErrRejected`. This causes the underlying `jsonrpc2.Connection` to set its `writeErr` flag permanently (in `internal/jsonrpc2/conn.go:794-810`), which is never cleared and blocks all future operations.

The `jsonrpc2` package already has a mechanism to handle recoverable errors via `ErrRejected` (see `conn.go:788-792`), but the streamable transport doesn't use it.

After any transient error, the connection becomes unusable for **all MCP operations**: `CallTool()`, `ListTools()`, `ListResources()`, etc. The entire session must be recreated, even though the error was temporary and should have been recoverable.

## Related Issues

- **[go-sdk#479](https://github.com/modelcontextprotocol/go-sdk/issues/479)** - Server-side cleanup callbacks for closed connections
- **[modelcontextprotocol/modelcontextprotocol#1442](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1442)** - SEP proposal to make MCP stateless by default

## Reproduction Test Code

Below is a minimal reproduction test:

<details>
<summary>Click to expand test code</summary>

```go
package gosdk

import (
    "context"
    "fmt"
    "net/http"
    "net/http/httptest"
    "sync/atomic"
    "testing"
    "time"

    "github.com/modelcontextprotocol/go-sdk/mcp"
)

// testDelays defines the sleep duration for each call
var testDelays = []time.Duration{
    500 * time.Millisecond, // Call 1: fast, should succeed
    3 * time.Second,        // Call 2: slow, will timeout
    500 * time.Millisecond, // Call 3: fast, should succeed but fails due to bug
}

// createDelayTool creates an MCP tool that sleeps for configurable durations
func createDelayTool(callCount *atomic.Int32) (*mcp.Server, *mcp.Tool) {
    server := mcp.NewServer(&mcp.Implementation{
        Name:    "test-server",
        Version: "1.0.0",
    }, nil)

    tool := &mcp.Tool{
        Name:        "delay_tool",
        Description: "Tool with configurable delays for testing",
    }

    handler := func(ctx context.Context, req *mcp.CallToolRequest, args any) (*mcp.CallToolResult, any, error) {
        callNum := int(callCount.Add(1))
        delay := testDelays[0] // default
        if callNum <= len(testDelays) {
            delay = testDelays[callNum-1]
        }

        time.Sleep(delay)

        return &mcp.CallToolResult{
            Content: []mcp.Content{
                &mcp.TextContent{
                    Text: fmt.Sprintf("Call #%d completed", callNum),
                },
            },
        }, nil, nil
    }

    mcp.AddTool(server, tool, handler)
    return server, tool
}

// setupTestServer creates and starts an HTTP test server with MCP handler
func setupTestServer(t *testing.T, server *mcp.Server, stateless bool) *httptest.Server {
    var opts *mcp.StreamableHTTPOptions
    if stateless {
        opts = &mcp.StreamableHTTPOptions{Stateless: true}
    }

    handler := mcp.NewStreamableHTTPHandler(func(req *http.Request) *mcp.Server {
        return server
    }, opts)

    httpServer := httptest.NewServer(handler)
    t.Cleanup(httpServer.Close)
    return httpServer
}

// createMCPSession creates and connects an MCP client session
func createMCPSession(t *testing.T, serverURL string, clientTimeout time.Duration) *mcp.ClientSession {
    ctx := context.Background()
    httpClient := &http.Client{Timeout: clientTimeout}

    client := mcp.NewClient(&mcp.Implementation{
        Name:    "test-client",
        Version: "1.0.0",
    }, nil)

    transport := &mcp.StreamableClientTransport{
        Endpoint:   serverURL,
        HTTPClient: httpClient,
    }

    session, err := client.Connect(ctx, transport, nil)
    if err != nil {
        t.Fatalf("Failed to connect: %v", err)
    }
    t.Cleanup(func() { session.Close() })

    return session
}

// callResult represents the result of a CallTool invocation
type callResult struct {
    num    int
    err    error
    result *mcp.CallToolResult
}

// performCallSequence executes three sequential tool calls and returns results
func performCallSequence(session *mcp.ClientSession) []callResult {
    results := make([]callResult, 3)

    for i := range 3 {
        ctx := context.Background() // Fresh context for each call
        result, err := session.CallTool(ctx, &mcp.CallToolParams{
            Name:      "delay_tool",
            Arguments: map[string]any{},
        })
        results[i] = callResult{num: i + 1, err: err, result: result}
    }

    return results
}

// reportResults logs the test results
func reportResults(t *testing.T, results []callResult) {
    t.Helper()

    for _, r := range results {
        if r.err != nil {
            t.Logf("Call #%d: FAILED - %v", r.num, r.err)
        } else {
            t.Logf("Call #%d: SUCCESS", r.num)
        }
    }

    // Expected behavior: Call 3 should succeed even after Call 2 timeout
    if results[2].err != nil {
        t.Errorf("Call #3 failed after transient error in Call #2")
    }
}

// TestTimeoutBugReproduction tests connection behavior after timeout in stateful mode.
// Expected: Call #3 should succeed even after Call #2 timeout.
func TestTimeoutBugReproduction(t *testing.T) {
    var callCount atomic.Int32
    server, _ := createDelayTool(&callCount)
    httpServer := setupTestServer(t, server, false) // stateful mode
    session := createMCPSession(t, httpServer.URL, 2*time.Second)

    results := performCallSequence(session)
    reportResults(t, results)
}

// TestTimeoutStateless tests connection behavior after timeout in stateless mode.
// Expected: Call #3 should succeed even after Call #2 timeout.
func TestTimeoutStateless(t *testing.T) {
    var callCount atomic.Int32
    server, _ := createDelayTool(&callCount)
    httpServer := setupTestServer(t, server, true) // stateless mode
    session := createMCPSession(t, httpServer.URL, 2*time.Second)

    results := performCallSequence(session)
    reportResults(t, results)
}
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StreamableClientTransport: Connection poisoned by transient errors #683

Problem

1. What did you do?

2. What did you see?

3. What did you expect to see?

4. What version of the Go MCP SDK are you using?

5. What version of Go are you using (`go version`)?

Root Cause

Related Issues

Reproduction Test Code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StreamableClientTransport: Connection poisoned by transient errors #683

Description

Problem

1. What did you do?

2. What did you see?

3. What did you expect to see?

4. What version of the Go MCP SDK are you using?

5. What version of Go are you using (go version)?

Root Cause

Related Issues

Reproduction Test Code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

5. What version of Go are you using (`go version`)?