Skip to content

fix: prevent session + task leak on GET/DELETE without session-id#3059

Open
raphaelOhana wants to merge 5 commits into
modelcontextprotocol:v1.xfrom
raphaelOhana:fix/streamable-http-session-leak
Open

fix: prevent session + task leak on GET/DELETE without session-id#3059
raphaelOhana wants to merge 5 commits into
modelcontextprotocol:v1.xfrom
raphaelOhana:fix/streamable-http-session-leak

Conversation

@raphaelOhana

Copy link
Copy Markdown

Summary

In stateful mode, StreamableHTTPSessionManager currently allocates a
StreamableHTTPServerTransport and spawns a background task for every
request without an MCP-Session-Id header — including GET and DELETE,
which the MCP spec does not allow to initialize a session. The transport
rejects the request downstream (400/406), but the allocated session and
task are never cleaned up. Only session_idle_timeout (opt-in, off by
default) reaps them later.

This PR moves the "must have a session-id" check up into the manager for
GET and DELETE, so those requests are rejected with 400 Bad Request: Missing session ID before any session is allocated.

Repro (before the patch)

Fire 100 GET /mcp requests with an SDK-only in-process server (no
FastMCP, no external transport), then inspect _server_instances:

[SNAPSHOT baseline                           ] sessions=   0  tasks=   0
[SNAPSHOT after 100x GET /mcp/ (no header)   ] sessions= 100  tasks= 100
[SNAPSHOT after 100x GET /mcp/ (Accept hdr)  ] sessions= 200  tasks= 200
[SNAPSHOT after 100x DELETE /mcp/            ] sessions= 300  tasks= 300

Every one of those 300 sessions is still in _server_instances, and
every one has a corresponding suspended anyio task in the task group. In
production this shows up as Created new transport with session ID: …
in the server logs with no matching Session terminated / closed / ended / deleted events.

We caught this via a Docker healthcheck polling GET /mcp every 30s on
a taylorwilsdon/google_workspace_mcp container (which uses this SDK
via FastMCP) running for 10 days: 28 800 sessions accumulated, RAM
climbed from ~100 MiB to ~1.3 GiB (~46 KiB / leaked session), 0
restarts, container reported healthy the whole time.

Repro after the patch

Same script, same requests:

[SNAPSHOT after 100x GET /mcp/ (no header)   ] sessions=   0  tasks=   0
[SNAPSHOT after 100x GET /mcp/ (Accept hdr)  ] sessions=   0  tasks=   0
[SNAPSHOT after 100x DELETE /mcp/            ] sessions=   0  tasks=   0

Requests are rejected at the manager layer with a well-formed JSON-RPC
error:

{
  "jsonrpc": "2.0",
  "id": "server-error",
  "error": {
    "code": -32600,
    "message": "Bad Request: Missing session ID"
  }
}

Why the manager and not the transport

The transport-level rejection happens after the manager has already
allocated a session and spawned the background task. That task waits on
serve_loop / app.run(read_stream, write_stream, …) which never
returns because read_stream never receives anything, so the finally
cleanup at the bottom of the task never fires. Moving the check up-stack
means we never enter the code path that leaks.

What the change touches

  • src/mcp/server/streamable_http_manager.py:
    • pre-instantiates a TransportSecurityMiddleware on the manager so
      the DNS-rebinding / bad-Host check runs before session allocation
      (previously that check lived only in the transport, so a bad-Host
      request would first allocate a session, then get rejected — same
      leak),
    • rejects GET and DELETE without a session-id with
      400 "Missing session ID" (same message the transport uses for the
      equivalent POST case at streamable_http.py:844, kept identical to
      avoid diverging error surfaces),
    • leaves PUT/PATCH/OPTIONS alone — they continue to reach the
      transport's _handle_unsupported_request and get the existing
      405 Method Not Allowed response.
  • tests/server/test_streamable_http_manager.py:
    • adds a parametrized regression test
      test_non_post_without_session_id_does_not_allocate_session[GET, DELETE]
      that asserts _server_instances and the task group both stay empty
      after a bad request, and that the response is the JSON-RPC 400
      shape.

Full suite passes locally on v1.x:

  • tests/server/test_streamable_http_manager.py — all pass (existing +
    2 new).
  • tests/server/test_streamable_http_security.py — passes
    (test_streamable_http_security_get_request still returns 421 for
    bad Host, verifying the reorder didn't break DNS-rebinding).
  • tests/shared/test_streamable_http.py — passes
    (test_method_not_allowed still returns 405 for PUT, verifying the
    fix only intercepts GET/DELETE).
  • tests/issues/test_1363_race_condition_streamable_http.py — passes.

Total: 82 passed, 0 failed on v1.x.

Behavioural notes for downstream users

  • POST /mcp without a session-id: unchanged — still initializes a
    new session.
  • GET /mcp without a session-id: was 406 Not Acceptable (or 400 if
    the client sent the right Accept header); now consistently
    400 "Bad Request: Missing session ID".
  • DELETE /mcp without a session-id: was 400 "Missing session ID"
    from the transport (with a leaked session); now
    400 "Missing session ID" from the manager (no leak).
  • PUT/PATCH/OPTIONS without a session-id: unchanged — still
    405 Method Not Allowed.

The response body shape is the same JSON-RPC error the manager already
uses for unknown-session cases, so clients that already parse that
shape need no changes.

…ocating

Before this patch, any request to the stateful streamable_http_manager
without an MCP-Session-Id header entered the "new session" branch and:

  1. allocated a StreamableHTTPServerTransport
  2. registered it in `_server_instances`
  3. spawned a background `run_server` task waiting on `serve_loop` /
     `app.run()` for messages that never come

The subsequent transport-level rejection (400 "Missing session ID" for
GET/DELETE, 406 "Not Acceptable" for GET with missing Accept header)
returned the correct HTTP response but did not tear the allocated
session or its task down. The `finally` cleanup in the background task
only fires when the loop completes, and without the (opt-in,
off-by-default) `session_idle_timeout` the task blocks forever.

Under a real deployment we hit this via a Docker healthcheck polling
`GET /mcp` every 30s on a FastMCP-based server: ~2 sessions/min leaked,
~5.3 MiB/day RAM growth, 28 800 accumulated sessions with 0 teardown
events over 10 days.

The fix reorders `_handle_stateful_request` so that:

  * security validation runs first (preserves the 421 DNS-rebinding
    behaviour tested in test_streamable_http_security_get_request),
  * GET and DELETE with no session-id return 400 "Missing session ID"
    at the manager layer without touching `_server_instances` and
    without spawning any task,
  * POST without session-id continues to initialize a new session
    exactly as before,
  * PUT/PATCH/OPTIONS with no session-id continue to reach the
    transport and get the existing 405 "Method Not Allowed".

A regression test asserts both counters (`_server_instances`,
`_task_group._tasks`) stay at zero after 300 bad requests.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/mcp/server/streamable_http_manager.py">

<violation number="1" location="src/mcp/server/streamable_http_manager.py:297">
P2: Requests without `MCP-Session-Id` can still allocate a stateful session for non-POST methods other than GET/DELETE (for example PUT/PATCH/OPTIONS/HEAD). The current guard only rejects GET/DELETE early, so other unsupported methods still enter the new-session path, spawn `run_server`, and only then return 405 from transport handling. This keeps a leak path open under repeated invalid-method traffic. It would be safer to short-circuit all non-POST missing-session requests at the manager level (returning either 400 or 405 as intended) before creating transport state.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread src/mcp/server/streamable_http_manager.py Outdated
- Add ``test_bad_host_header_rejected_before_session_allocation`` that
  drives the DNS-rebinding branch of the manager and asserts the
  request is rejected with 421 without allocating a session. Restores
  the strict 100% coverage on ``streamable_http_manager.py``.
- Suppress the pyright ``reportUnknownMemberType`` on
  ``manager._task_group._tasks`` — that attribute is private anyio
  internals with no exported type, but we deliberately introspect it to
  prove the leaked task from the original bug is no longer spawned.
…ion-id

cubic-dev-ai flagged that the previous fix only short-circuited GET and
DELETE — the other non-POST methods still entered the "new session"
branch, allocated a transport, and spawned run_server before the
transport-layer 405 was returned. Same leak, different method.

This commit extends the check to every non-POST method. GET/DELETE
remain 400 "Missing session ID" (protocol-valid when a session exists,
so the caller may have simply forgotten the header). Everything else
becomes 405 "Method Not Allowed" at the manager layer — matching the
old transport-layer 405 exactly, minus the leaked session and task.

Also applies the repository's ruff format to the test module, which
the pre-commit hook flagged on CI, and switches the regression test's
assertions to use the parametrized ``expected_status`` /
``expected_message_substring`` values so the new PUT/PATCH/OPTIONS/HEAD
rows can piggy-back on the same test body.
@raphaelOhana

Copy link
Copy Markdown
Author

Thanks for the review, cubic — that's a valid catch. Fixed in cf874959.

The updated guard now short-circuits every non-POST request that arrives without a session-id, not just GET/DELETE. Response codes:

  • GET/DELETE400 "Missing session ID" (protocol-valid methods when a session exists, so the caller may have just forgotten the header)
  • Everything else (PUT/PATCH/OPTIONS/HEAD/...) → 405 "Method Not Allowed" at the manager layer, matching the transport's existing 405 exactly, minus the leaked session and task.

The regression test parametrisation was extended to cover all four extra methods (PUT, PATCH, OPTIONS, HEAD) and each row asserts both the expected status code and the expected error-message substring. test_method_not_allowed in tests/shared/test_streamable_http.py still passes because the response body still contains the literal "Method Not Allowed".

Same commit also runs ruff format on the test module (pre-commit hook flagged one line).

Local: 87 passed, 100% coverage on streamable_http_manager.py, pyright clean on both files.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/mcp/server/streamable_http_manager.py">

<violation number="1" location="src/mcp/server/streamable_http_manager.py:316">
P2: Unsupported methods now return 405 without the `Allow` header because the manager builds a custom response instead of reusing the transport’s method-not-allowed path. That can break clients/middleware that rely on standards-compliant 405 metadata to determine allowed methods. Consider preserving the transport’s 405 shape (including `Allow`) when rejecting non-POST requests without a session id.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread src/mcp/server/streamable_http_manager.py Outdated
…+ body)

cubic-dev-ai flagged that the manager's manager-layer 405 for
PUT/PATCH/OPTIONS/HEAD without a session-id was missing the RFC 7231
``Allow`` header and diverged from the transport's response body. That
breaks clients / middleware that rely on the standard 405 metadata to
learn which methods are allowed on the resource.

The manager now mirrors ``StreamableHTTPServerTransport._handle_unsupported_request``
exactly:

  * Body: JSON-RPC error with message ``"Method Not Allowed"`` (previously
    ``"Method Not Allowed (PUT)"`` — a divergence).
  * Headers: ``Content-Type: application/json`` + ``Allow: GET, POST, DELETE``.

The 400 branch for GET/DELETE is unchanged.

The parametrized regression test now also asserts the ``Allow`` header
value for every 405 row (PUT/PATCH/OPTIONS/HEAD).
@raphaelOhana

Copy link
Copy Markdown
Author

Good catch, cubic — fixed in `be9c1188`.

The manager's 405 for PUT/PATCH/OPTIONS/HEAD without a session-id now mirrors `StreamableHTTPServerTransport._handle_unsupported_request` exactly:

  • Body: JSON-RPC error with message `"Method Not Allowed"` (dropped the `(PUT)` suffix that diverged from the transport).
  • Headers: `Content-Type: application/json` + `Allow: GET, POST, DELETE` (RFC 7231-compliant).

The 400 branch for GET/DELETE is unchanged.

The parametrized regression test now also asserts the `Allow` header for each 405 row. Local: 87 passed, 100% coverage on `streamable_http_manager.py`, ruff + pyright clean.

The strict ``fail-under=100`` coverage gate on this repository counts
branch coverage, and the previous ``if expected_status == 405:`` guard
inside the parametrized test added an untaken branch on the 400 rows
(which coverage reports as ``477->exit`` = 99.99% total).

Fold the header check into the parametrize rows via a new
``expected_allow_header`` column and assert unconditionally: the 405
rows expect ``"GET, POST, DELETE"``, the 400 rows expect the header to
be absent (``None``). Same coverage of the manager code, no dead branch
inside the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant