fix: prevent session + task leak on GET/DELETE without session-id#3059
fix: prevent session + task leak on GET/DELETE without session-id#3059raphaelOhana wants to merge 5 commits into
Conversation
…ocating
Before this patch, any request to the stateful streamable_http_manager
without an MCP-Session-Id header entered the "new session" branch and:
1. allocated a StreamableHTTPServerTransport
2. registered it in `_server_instances`
3. spawned a background `run_server` task waiting on `serve_loop` /
`app.run()` for messages that never come
The subsequent transport-level rejection (400 "Missing session ID" for
GET/DELETE, 406 "Not Acceptable" for GET with missing Accept header)
returned the correct HTTP response but did not tear the allocated
session or its task down. The `finally` cleanup in the background task
only fires when the loop completes, and without the (opt-in,
off-by-default) `session_idle_timeout` the task blocks forever.
Under a real deployment we hit this via a Docker healthcheck polling
`GET /mcp` every 30s on a FastMCP-based server: ~2 sessions/min leaked,
~5.3 MiB/day RAM growth, 28 800 accumulated sessions with 0 teardown
events over 10 days.
The fix reorders `_handle_stateful_request` so that:
* security validation runs first (preserves the 421 DNS-rebinding
behaviour tested in test_streamable_http_security_get_request),
* GET and DELETE with no session-id return 400 "Missing session ID"
at the manager layer without touching `_server_instances` and
without spawning any task,
* POST without session-id continues to initialize a new session
exactly as before,
* PUT/PATCH/OPTIONS with no session-id continue to reach the
transport and get the existing 405 "Method Not Allowed".
A regression test asserts both counters (`_server_instances`,
`_task_group._tasks`) stay at zero after 300 bad requests.
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/mcp/server/streamable_http_manager.py">
<violation number="1" location="src/mcp/server/streamable_http_manager.py:297">
P2: Requests without `MCP-Session-Id` can still allocate a stateful session for non-POST methods other than GET/DELETE (for example PUT/PATCH/OPTIONS/HEAD). The current guard only rejects GET/DELETE early, so other unsupported methods still enter the new-session path, spawn `run_server`, and only then return 405 from transport handling. This keeps a leak path open under repeated invalid-method traffic. It would be safer to short-circuit all non-POST missing-session requests at the manager level (returning either 400 or 405 as intended) before creating transport state.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
- Add ``test_bad_host_header_rejected_before_session_allocation`` that drives the DNS-rebinding branch of the manager and asserts the request is rejected with 421 without allocating a session. Restores the strict 100% coverage on ``streamable_http_manager.py``. - Suppress the pyright ``reportUnknownMemberType`` on ``manager._task_group._tasks`` — that attribute is private anyio internals with no exported type, but we deliberately introspect it to prove the leaked task from the original bug is no longer spawned.
…ion-id cubic-dev-ai flagged that the previous fix only short-circuited GET and DELETE — the other non-POST methods still entered the "new session" branch, allocated a transport, and spawned run_server before the transport-layer 405 was returned. Same leak, different method. This commit extends the check to every non-POST method. GET/DELETE remain 400 "Missing session ID" (protocol-valid when a session exists, so the caller may have simply forgotten the header). Everything else becomes 405 "Method Not Allowed" at the manager layer — matching the old transport-layer 405 exactly, minus the leaked session and task. Also applies the repository's ruff format to the test module, which the pre-commit hook flagged on CI, and switches the regression test's assertions to use the parametrized ``expected_status`` / ``expected_message_substring`` values so the new PUT/PATCH/OPTIONS/HEAD rows can piggy-back on the same test body.
|
Thanks for the review, cubic — that's a valid catch. Fixed in The updated guard now short-circuits every non-POST request that arrives without a session-id, not just GET/DELETE. Response codes:
The regression test parametrisation was extended to cover all four extra methods ( Same commit also runs Local: 87 passed, 100% coverage on |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/mcp/server/streamable_http_manager.py">
<violation number="1" location="src/mcp/server/streamable_http_manager.py:316">
P2: Unsupported methods now return 405 without the `Allow` header because the manager builds a custom response instead of reusing the transport’s method-not-allowed path. That can break clients/middleware that rely on standards-compliant 405 metadata to determine allowed methods. Consider preserving the transport’s 405 shape (including `Allow`) when rejecting non-POST requests without a session id.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
…+ body)
cubic-dev-ai flagged that the manager's manager-layer 405 for
PUT/PATCH/OPTIONS/HEAD without a session-id was missing the RFC 7231
``Allow`` header and diverged from the transport's response body. That
breaks clients / middleware that rely on the standard 405 metadata to
learn which methods are allowed on the resource.
The manager now mirrors ``StreamableHTTPServerTransport._handle_unsupported_request``
exactly:
* Body: JSON-RPC error with message ``"Method Not Allowed"`` (previously
``"Method Not Allowed (PUT)"`` — a divergence).
* Headers: ``Content-Type: application/json`` + ``Allow: GET, POST, DELETE``.
The 400 branch for GET/DELETE is unchanged.
The parametrized regression test now also asserts the ``Allow`` header
value for every 405 row (PUT/PATCH/OPTIONS/HEAD).
|
Good catch, cubic — fixed in `be9c1188`. The manager's 405 for PUT/PATCH/OPTIONS/HEAD without a session-id now mirrors `StreamableHTTPServerTransport._handle_unsupported_request` exactly:
The 400 branch for GET/DELETE is unchanged. The parametrized regression test now also asserts the `Allow` header for each 405 row. Local: 87 passed, 100% coverage on `streamable_http_manager.py`, ruff + pyright clean. |
The strict ``fail-under=100`` coverage gate on this repository counts branch coverage, and the previous ``if expected_status == 405:`` guard inside the parametrized test added an untaken branch on the 400 rows (which coverage reports as ``477->exit`` = 99.99% total). Fold the header check into the parametrize rows via a new ``expected_allow_header`` column and assert unconditionally: the 405 rows expect ``"GET, POST, DELETE"``, the 400 rows expect the header to be absent (``None``). Same coverage of the manager code, no dead branch inside the test.
Summary
In stateful mode,
StreamableHTTPSessionManagercurrently allocates aStreamableHTTPServerTransportand spawns a background task for everyrequest without an
MCP-Session-Idheader — includingGETandDELETE,which the MCP spec does not allow to initialize a session. The transport
rejects the request downstream (400/406), but the allocated session and
task are never cleaned up. Only
session_idle_timeout(opt-in, off bydefault) reaps them later.
This PR moves the "must have a session-id" check up into the manager for
GETandDELETE, so those requests are rejected with400 Bad Request: Missing session IDbefore any session is allocated.Repro (before the patch)
Fire 100
GET /mcprequests with an SDK-only in-process server (noFastMCP, no external transport), then inspect
_server_instances:Every one of those 300 sessions is still in
_server_instances, andevery one has a corresponding suspended anyio task in the task group. In
production this shows up as
Created new transport with session ID: …in the server logs with no matching
Session terminated / closed / ended / deletedevents.We caught this via a Docker healthcheck polling
GET /mcpevery 30s ona
taylorwilsdon/google_workspace_mcpcontainer (which uses this SDKvia FastMCP) running for 10 days: 28 800 sessions accumulated, RAM
climbed from ~100 MiB to ~1.3 GiB (~46 KiB / leaked session), 0
restarts, container reported healthy the whole time.
Repro after the patch
Same script, same requests:
Requests are rejected at the manager layer with a well-formed JSON-RPC
error:
{ "jsonrpc": "2.0", "id": "server-error", "error": { "code": -32600, "message": "Bad Request: Missing session ID" } }Why the manager and not the transport
The transport-level rejection happens after the manager has already
allocated a session and spawned the background task. That task waits on
serve_loop/app.run(read_stream, write_stream, …)which neverreturns because
read_streamnever receives anything, so thefinallycleanup at the bottom of the task never fires. Moving the check up-stack
means we never enter the code path that leaks.
What the change touches
src/mcp/server/streamable_http_manager.py:TransportSecurityMiddlewareon the manager sothe DNS-rebinding / bad-Host check runs before session allocation
(previously that check lived only in the transport, so a bad-Host
request would first allocate a session, then get rejected — same
leak),
GETandDELETEwithout a session-id with400 "Missing session ID"(same message the transport uses for theequivalent POST case at
streamable_http.py:844, kept identical toavoid diverging error surfaces),
PUT/PATCH/OPTIONSalone — they continue to reach thetransport's
_handle_unsupported_requestand get the existing405 Method Not Allowedresponse.tests/server/test_streamable_http_manager.py:test_non_post_without_session_id_does_not_allocate_session[GET, DELETE]that asserts
_server_instancesand the task group both stay emptyafter a bad request, and that the response is the JSON-RPC 400
shape.
Full suite passes locally on v1.x:
tests/server/test_streamable_http_manager.py— all pass (existing +2 new).
tests/server/test_streamable_http_security.py— passes(
test_streamable_http_security_get_requeststill returns 421 forbad Host, verifying the reorder didn't break DNS-rebinding).
tests/shared/test_streamable_http.py— passes(
test_method_not_allowedstill returns 405 for PUT, verifying thefix only intercepts GET/DELETE).
tests/issues/test_1363_race_condition_streamable_http.py— passes.Total: 82 passed, 0 failed on v1.x.
Behavioural notes for downstream users
POST /mcpwithout a session-id: unchanged — still initializes anew session.
GET /mcpwithout a session-id: was406 Not Acceptable(or 400 ifthe client sent the right Accept header); now consistently
400 "Bad Request: Missing session ID".DELETE /mcpwithout a session-id: was400 "Missing session ID"from the transport (with a leaked session); now
400 "Missing session ID"from the manager (no leak).PUT/PATCH/OPTIONSwithout a session-id: unchanged — still405 Method Not Allowed.The response body shape is the same JSON-RPC error the manager already
uses for unknown-session cases, so clients that already parse that
shape need no changes.