Skip to content

Conversation

@samthanawalla
Copy link
Contributor

@samthanawalla samthanawalla commented Jul 14, 2025

@samthanawalla samthanawalla requested a review from jba July 14, 2025 19:29
@samthanawalla samthanawalla marked this pull request as ready for review July 14, 2025 19:29
Copy link
Contributor

@jba jba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First preliminary review. With Rob out, I want to really understand this. That may take a day or two.

@samthanawalla samthanawalla force-pushed the resumability branch 2 times, most recently from 5a2d5aa to 9d77544 Compare July 15, 2025 16:43
This CL implements a retry mechanism to resume SSE streams to recover
from network failures.
return
case s.incoming <- evt.Data:

if !isResumable(resp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may also want to try resuming if the error from establishSSE is non-nil. For example, if the network is partitioned, that might manifest as a timeout error instead of an HTTP response. But we can leave that for a later PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the way it's written- we will retry another attempt if the error is non-nil. Are you saying that we have to do something special in that case?

}

// Perform handshake.
initReq := &jsonrpc.Request{ID: jsonrpc2.Int64ID(100), Method: "initialize", Params: mustMarshal(t, &InitializeParams{})}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have to happen by hand?
I guess what I mean is, can you write this test with a Client instead of at the transport level? If you need to reach into the transport for something, maybe you could add test hooks to the transport code. That may ultimately be cleaner that re-writing the init protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Updated to test with Client.

if err != nil {
t.Fatalf("Failed to read notification: %v", err)
}
if req, ok := msg.(*jsonrpc.Request); ok && req.Method == "notifications/progress" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it also be an error if you see a non-notification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to only hook into progress notifications.

@samthanawalla samthanawalla requested a review from jba July 22, 2025 18:36
t.Fatalf("Failed to listen on proxy address: %v", err)
}
restartedProxy := &http.Server{Handler: proxyHandler}
go restartedProxy.Serve(listener)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this goroutine stopped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restartedProxy.Close() will terminate it.

"Serve always returns a non-nil error and closes l. After [Server.Shutdown] or [Server.Close], the returned error is [ErrServerClosed]."

// that is killed and restarted to simulate a recoverable network outage.
func TestClientReplayAfterProxyBreak(t *testing.T) {
func TestClientReplay(t *testing.T) {
notifications := make(chan string, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document how the channel buffer size affects correctness. (e.g. must be at least as large as the number of notifications sent in the tool handler on L125?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like buffer size has no effect since we collect them all. Removed the size of 10.

@samthanawalla samthanawalla merged commit a911cd0 into modelcontextprotocol:main Jul 22, 2025
3 checks passed
@samthanawalla samthanawalla deleted the resumability branch September 11, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants