Skip to content

Race condition in HTTP transport session management causing "Session not found" errors #128

@polo13410

Description

@polo13410

Bug Description

The HTTP transport (HttpStreamTransport) has a race condition in its session management that causes "Session not found" errors (HTTP 404, code -32001) when clients make subsequent requests after initialization.

Environment

  • mcp-framework version: 0.2.15
  • Transport type: http-stream
  • Client: @mastra/mcp v0.13.4 (using @modelcontextprotocol/sdk v1.20.0)

Problem

Sessions are being deleted from the _transports map while they're still in use, leading to failed tool calls and other operations.

Root Cause

The session cleanup logic is too aggressive in several places:

  1. In transport.onerror handler (line 96-101): Sessions are immediately deleted when any error occurs
  2. In send() method (line 178): Failed sessions are deleted after broadcast attempts
  3. No synchronization: Multiple concurrent requests can cause one error handler to delete a session while another request is still processing

Reproduction

  1. Start MCP server with HTTP transport on port 3001
  2. Connect from @mastra/mcp client via HTTP
  3. Initial connection succeeds, session created (e.g., 69532732-5675-416d-80c6-407971079464)
  4. Logs show: [ERROR] Error sending message to session de9f3321-87b5-48f6-a795-6a6a045a87f2: Error: No connection established for request ID: 0
  5. Subsequent tool calls fail with: {"jsonrpc":"2.0","error":{"code":-32001,"message":"Session not found"},"id":null}

Error Logs

[INFO] Creating new session for initialization request
[INFO] Session initialized: 69532732-5675-416d-80c6-407971079464
[ERROR] Error sending message to session de9f3321-87b5-48f6-a795-6a6a045a87f2: Error: No connection established for request ID: 0
[WARN] Failed to send message to 1 sessions.

Later when client tries to use tools:

{
  "message": "Error POSTing to endpoint (HTTP 404): {\"jsonrpc\":\"2.0\",\"error\":{\"code\":-32001,\"message\":\"Session not found\"},\"id\":null}"
}

Suggested Fix

The core issue is premature session deletion. Sessions should only be removed when definitively closed, not on transient errors.

Patch

diff --git a/node_modules/mcp-framework/dist/transports/http/server.js b/node_modules/mcp-framework/dist/transports/http/server.js
index 1234567..abcdefg 100644
--- a/node_modules/mcp-framework/dist/transports/http/server.js
+++ b/node_modules/mcp-framework/dist/transports/http/server.js
@@ -85,8 +85,10 @@ export class HttpStreamTransport extends AbstractTransport {
                     onsessioninitialized: (sessionId) => {
                         logger.info(`Session initialized: ${sessionId}`);
                         this._transports[sessionId] = transport;
                     },
+                    onsessionclosed: (sessionId) => {
+                        logger.info(`Session closed: ${sessionId}`);
+                        delete this._transports[sessionId];
+                    },
                     enableJsonResponse: this._enableJsonResponse,
                 });
                 transport.onclose = () => {
@@ -94,7 +96,9 @@ export class HttpStreamTransport extends AbstractTransport {
                         logger.info(`Transport closed for session: ${transport.sessionId}`);
                         delete this._transports[transport.sessionId];
                     }
                 };
                 transport.onerror = (error) => {
-                    logger.error(`Transport error for session: ${error}`);
+                    logger.error(`Transport error for session ${transport.sessionId}: ${error}`);
+                    /*
                     if (transport.sessionId) {
                         delete this._transports[transport.sessionId];
                     }
+                    */
                 };
                 transport.onmessage = async (message) => {
                     if (this._onmessage) {
@@ -175,7 +179,8 @@ export class HttpStreamTransport extends AbstractTransport {
                 failedSessions.push(sessionId);
             }
         }
         if (failedSessions.length > 0) {
-            failedSessions.forEach((sessionId) => delete this._transports[sessionId]);
+            // failedSessions.forEach((sessionId) => delete this._transports[sessionId]);
             logger.warn(`Failed to send message to ${failedSessions.length} sessions.`);
         }
     }

Key Changes

  1. Don't delete sessions on error: Comment out session deletion in transport.onerror handler
  2. Don't delete sessions on failed sends: Comment out immediate deletion in send() method
  3. Better error logging: Include session ID in error messages for debugging
  4. Add onsessionclosed handler: Properly handle session closure events (if supported by SDK)

Impact

This bug affects all HTTP transport users, particularly those using client libraries like @mastra/mcp that maintain persistent connections. It makes the HTTP transport unreliable for production use.

Workaround

Until fixed, users can:

  1. Use stdio transport instead of HTTP for local connections
  2. Apply the above patch using patch-package
  3. Implement retry logic in the client to handle "Session not found" errors

Additional Context

The documentation marks HTTP transport as "experimental," which aligns with this session management issue. Fixing this would make HTTP transport more production-ready.

Would appreciate your thoughts on the best approach. Happy to submit a PR if helpful!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions