Skip to content

Commit b46806f

Browse files
committed
Correct spec-accuracy gaps across the interaction requirements manifest
A full audit of the manifest against the 2025-11-25 and 2026-07-28 specification texts and the typescript-sdk requirements suite. The fixes: - Repair spec provenance: wrong source sections and anchors, missing added_in/removed_in stamps, behavior sentences over- or under-claiming their cited mandates, and notes misreading the spec (most notably elicitation:form:response-validation, whose requestedSchema-validation SHOULDs persist at 2026-07-28 and were wrongly retired into a structural-only successor). - Replace three false deferrals with real tests: per-request log-level rejection, the pinned client ignoring a server-issued Mcp-Session-Id, and client-API cancellation sending notifications/cancelled. - Validate Divergence.issue like KnownFailure.issue and drop the issue values that were not real tracking links. - Track previously untracked surface: the modern entry's HTTP gates, the MRTR supported-method bounds, the x-mcp integer safe-range, the custom-method round-trip families, and other cross-SDK ledger gaps, each with a verified deferral reason or a covering test. - Harden the matrix machinery: a test whose stacked requirements intersect to zero cells is now a collection error, and coverage enforcement checks every admitted (transport, spec-version) cell, not just requirement-has-a-test. - Align ids, notes, and granularity with the typescript-sdk vocabulary where both suites pin the same contract, and strengthen tests that proved less than their behavior sentences claimed.
1 parent 3d92a1c commit b46806f

27 files changed

Lines changed: 2645 additions & 308 deletions

src/mcp/server/streamable_http.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -686,7 +686,7 @@ async def _handle_get_request(self, request: Request, send: Send) -> None:
686686
"Content-Type": CONTENT_TYPE_SSE,
687687
}
688688

689-
if self.mcp_session_id: # pragma: no branch
689+
if self.mcp_session_id:
690690
headers[MCP_SESSION_ID_HEADER] = self.mcp_session_id
691691

692692
# Check if we already have an active GET stream
@@ -750,7 +750,7 @@ async def standalone_sse_writer():
750750
async def _handle_delete_request(self, request: Request, send: Send) -> None:
751751
"""Handle DELETE requests for explicit session termination."""
752752
# Validate session ID
753-
if not self.mcp_session_id: # pragma: no cover
753+
if not self.mcp_session_id:
754754
# If no session ID set, return Method Not Allowed
755755
response = self._create_error_response(
756756
"Method Not Allowed: Session termination not supported",

tests/interaction/README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,13 @@ be exercised by at least one test, every deferred requirement by none, and an un
118118
import time. A behaviour without a manifest entry cannot be silently half-tested, and a manifest
119119
entry without a test cannot be silently aspirational.
120120

121+
Coverage is enforced per matrix cell as well: every (transport, spec version) cell a
122+
requirement's own grid admits must appear in the cells of at least one test covering it, so a
123+
version- or transport-bounded mark stacked onto a shared test cannot silently strip cells — an
124+
era, a transport — from the other requirements that test covers. A covering test that does not
125+
use the `connect` fixture counts for every admitted cell: it runs unparametrized, so no stacked
126+
mark can strip anything from it.
127+
121128
### The divergence lifecycle
122129

123130
1. A test reveals that the SDK does not do what the spec says. The test pins what the SDK
@@ -143,9 +150,11 @@ exercises. `SPEC_BASE_URL` (and `SPEC_2026_BASE_URL`) are pinned literals — no
143150
`SPEC_VERSIONS` — so growing the active axis never repoints existing `source` links. The
144151
`connect` fixture fans out over `CONNECTABLE_TRANSPORTS × SPEC_VERSIONS`, but the grid is
145152
filtered per test:
146-
`pytest_generate_tests` reads the test's stacked `@requirement` marks and calls `compute_cells()`,
147-
which intersects the admissible cells across every cited requirement — a cell survives only if
148-
**all** of the test's requirements admit it.
153+
`pytest_generate_tests` reads the test's stacked `@requirement` marks and calls `cells_for_test()`
154+
(a thin wrapper over `compute_cells()`), which intersects the admissible cells across every cited
155+
requirement — a cell survives only if **all** of the test's requirements admit it. A stack whose
156+
intersection is empty fails collection: a `connect` test that can never run on any cell is a
157+
manifest contradiction, not a skip.
149158

150159
`streamable-http-stateless` is the fourth connectable transport: the 2025-era unofficial stateless
151160
mode where each request opens a fresh transport, no session id is issued, and there is no standalone

tests/interaction/_requirements.py

Lines changed: 839 additions & 191 deletions
Large diffs are not rendered by default.

tests/interaction/auth/test_as_handlers.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -308,11 +308,12 @@ async def test_register_echoes_native_for_a_client_that_registered_application_t
308308
"""A client registering `application_type: "web"` is told `"native"` in the registration echo.
309309
310310
Pins the known gap recorded on the requirement (divergence): the registration handler's
311-
field-by-field passthrough omits `application_type`, so the model default fills the echo
312-
where RFC 7591 §3.2.1 requires the registered value -- and the SDK OAuth client adopts the
313-
echo into persisted storage, so the corruption is client-visible end to end. When the
314-
one-line passthrough fix lands this test fails: re-pin the echo to `"web"`, delete the
315-
Divergence, and add the echo assertion to
311+
field-by-field passthrough omits `application_type`, so the model default replaces the
312+
submitted value in the stored record and the echo alike -- wire-legal under RFC 7591 (a
313+
server may replace requested metadata values), but an accident of the field list rather
314+
than a policy, and the SDK OAuth client adopts the echo into persisted storage, so the
315+
corruption is client-visible end to end. When the one-line passthrough fix lands this test
316+
fails: re-pin the echo to `"web"`, delete the Divergence, and add the echo assertion to
316317
`test_dcr_sends_a_consumer_set_application_type_verbatim` (test_flow.py) per the
317318
requirement's note.
318319
"""

tests/interaction/auth/test_lifecycle.py

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -175,14 +175,14 @@ async def test_a_refresh_response_without_a_refresh_token_preserves_the_stored_o
175175
"""A refresh response that omits `refresh_token` leaves the stored one in place.
176176
177177
RFC 6749 §6 lets the authorization server omit `refresh_token` from a refresh response, in
178-
which case the client keeps the one it holds; the 2026 Refresh Tokens section (SEP-2207)
179-
restates this as "MUST NOT assume refresh tokens will be issued". The provider models the
180-
non-rotating AS: its refresh response carries only a new access token (`exclude_none`
181-
serialization keeps the key genuinely absent from the wire) and the presented token stays
182-
valid server-side. The preserved token alone could pass vacuously if the refresh response
183-
were dropped entirely, so the adopted `expires_in` (the first token's was -3600) proves it
184-
was not, and the single authorize/register pair proves the omission was treated as normal
185-
rather than triggering a re-authorization.
178+
which case the client keeps the one it holds -- the discipline the 2026 Refresh Tokens
179+
section's "MUST NOT assume refresh tokens will be issued" (SEP-2207) states for issuance
180+
generally. The provider models the non-rotating AS: its refresh response carries only a new
181+
access token (`exclude_none` serialization keeps the key genuinely absent from the wire) and
182+
the presented token stays valid server-side. The preserved token alone could pass vacuously
183+
if the refresh response were dropped entirely, so the adopted `expires_in` (the first
184+
token's was -3600) proves it was not, and the single authorize/register pair proves the
185+
omission was treated as normal rather than triggering a re-authorization.
186186
"""
187187
recorded, on_request = record_requests()
188188
provider = InMemoryAuthorizationServerProvider(issue_expired_first=True, rotate_refresh_tokens=False)
@@ -260,8 +260,10 @@ async def test_a_403_step_up_re_authorizes_with_the_union_of_prior_and_challenge
260260
"""The step-up re-authorize requests the union of the previously requested and challenged scopes.
261261
262262
The first authorization requests `mcp`; the 403 challenges a disjoint `write` (not naming
263-
`mcp`). Per SEP-2350 the client must re-authorize with `mcp write`, not drop `mcp`. The client
264-
is pre-registered with both scopes so the server's authorize handler accepts the wider request.
263+
`mcp`). The client re-authorizes with `mcp write`, not dropping `mcp` -- the SEP-2350 union,
264+
spec-mandated at 2026-07-28; on this legacy flow it is the SDK's own choice anticipating that
265+
mandate. The client is pre-registered with both scopes so the server's authorize handler
266+
accepts the wider request.
265267
"""
266268
provider = InMemoryAuthorizationServerProvider()
267269
storage = InMemoryTokenStorage(client_info=seeded_client(provider, scope="mcp write"))
@@ -325,11 +327,14 @@ async def test_tokens_from_the_previous_authorization_server_are_never_replayed_
325327
326328
Choreography twin of the as-binding discard test above, pinning the token half of the same
327329
SEP-2352 branch: storage carries both an old-issuer client registration and that server's
328-
tokens. The stale access token is presented once to the resource server (reload treats it
329-
as live), the 401 triggers the binding check, and the discard drops tokens together with
330-
the credentials -- so the stale refresh token reaches no endpoint of the new authorization
331-
server and the only token exchange is the fresh authorization-code grant. The requirement's
332-
note carries the refresh-ordering hazard this test is the regression net for.
330+
tokens, with the access token seeded already expired. Reload loses the expiry clock, so the
331+
stale access token is presented once to the resource server, the 401 triggers the binding
332+
check, and the discard drops tokens together with the credentials -- the stale refresh token
333+
reaches no endpoint of the new authorization server and the only token exchange is the fresh
334+
authorization-code grant. The expired seed arms the net for a fix that re-anchors the expiry
335+
clock at reload: the pre-discovery refresh branch then engages in this exact scenario, and
336+
the replay sweep fails unless the discard still runs ahead of any refresh attempt. The
337+
requirement's note carries the refresh-ordering hazard in full.
333338
"""
334339
recorded, on_request = record_requests()
335340
provider = InMemoryAuthorizationServerProvider()
@@ -349,7 +354,10 @@ async def test_tokens_from_the_previous_authorization_server_are_never_replayed_
349354
storage.tokens = OAuthToken(
350355
access_token="stale-access-token",
351356
token_type="Bearer",
352-
expires_in=3600,
357+
# Seeded already expired: today reload loses the expiry clock and treats the token as
358+
# live; if a fix re-anchors it, this seed drives the pre-discovery refresh branch --
359+
# the ordering hazard the replay sweep below must catch.
360+
expires_in=-3600,
353361
scope="mcp",
354362
refresh_token="stale-refresh-token",
355363
)

tests/interaction/conftest.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22
33
The ``connect`` fixture is parametrized per-test from the ``@requirement`` marks the test
44
carries: ``pytest_generate_tests`` looks up each cited requirement in the manifest and computes
5-
the (transport, spec_version) cells via :func:`compute_cells`, applying arm exclusions, version
6-
bounds, and known-failure xfails declaratively.
5+
the (transport, spec_version) cells via :func:`cells_for_test`, applying arm exclusions, version
6+
bounds, and known-failure xfails declaratively. A test whose stacked requirements intersect to
7+
zero cells fails collection instead of silently skipping.
78
"""
89

910
from functools import partial
@@ -17,7 +18,7 @@
1718
connect_over_streamable_http,
1819
connect_over_streamable_http_stateless,
1920
)
20-
from tests.interaction._requirements import REQUIREMENTS, compute_cells
21+
from tests.interaction._requirements import cells_for_test
2122

2223
_FACTORIES: dict[str, Connect] = {
2324
"in-memory": connect_in_memory,
@@ -31,8 +32,8 @@ def pytest_generate_tests(metafunc: pytest.Metafunc) -> None:
3132
"""Parametrize ``connect`` from the test's stacked ``@requirement`` marks."""
3233
if "connect" not in metafunc.fixturenames:
3334
return
34-
requirements = [REQUIREMENTS[mark.args[0]] for mark in metafunc.definition.iter_markers("requirement")]
35-
metafunc.parametrize("connect", compute_cells(requirements), indirect=True)
35+
requirement_ids = [mark.args[0] for mark in metafunc.definition.iter_markers("requirement")]
36+
metafunc.parametrize("connect", cells_for_test(metafunc.definition.nodeid, requirement_ids), indirect=True)
3637

3738

3839
@pytest.fixture

tests/interaction/lowlevel/test_cancellation.py

Lines changed: 126 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
"""Cancellation interactions against the low-level Server, driven through the public Client API.
22
3-
There is no client-side cancellation API: cancelling means sending a CancelledNotification
4-
carrying the request id, which only the server-side handler can observe (`ctx.request_id`), so
5-
these tests capture the id from inside the blocked handler before cancelling. The handler blocks
6-
on an Event rather than a sleep, and every wait is bounded by `anyio.fail_after`.
3+
Client-side cancellation is cancelling the caller's scope around an in-flight call; the
4+
dispatcher then sends the courtesy notifications/cancelled. The receiving-side tests instead
5+
drive the wire act directly -- sending a CancelledNotification carrying the request id, which
6+
only the server-side handler can observe (`ctx.request_id`) -- so they capture the id from
7+
inside the blocked handler before cancelling. Handlers block on an Event rather than a sleep,
8+
and every wait is bounded by `anyio.fail_after`.
79
"""
810

911
import anyio
@@ -27,16 +29,64 @@
2729

2830
from mcp import MCPError
2931
from mcp.client import ClientRequestContext, ClientSession
32+
from mcp.client._memory import InMemoryTransport
33+
from mcp.client.client import Client
3034
from mcp.server import Server, ServerRequestContext
3135
from mcp.shared.memory import MessageStream, create_client_server_memory_streams
3236
from mcp.shared.message import SessionMessage
3337
from tests.interaction._connect import Connect
34-
from tests.interaction._helpers import IncomingMessage
38+
from tests.interaction._helpers import IncomingMessage, RecordingTransport
3539
from tests.interaction._requirements import requirement
3640

3741
pytestmark = pytest.mark.anyio
3842

3943

44+
@requirement("protocol:cancel:abort-signal")
45+
async def test_cancelling_the_callers_scope_sends_cancelled_and_abandons_the_call() -> None:
46+
"""Cancelling the scope around an in-flight call sends notifications/cancelled and the call never returns.
47+
48+
Spec-mandated (cancellation flow): the sender of a cancelled request issues
49+
notifications/cancelled referencing its id. Legacy-era act: at 2026-07-28 the wire act splits
50+
by transport (see the manifest entry's note). The wire is observed at the recording-transport
51+
seam; the reason string is the SDK's own deliberate output.
52+
"""
53+
handler_started = anyio.Event()
54+
55+
async def call_tool(ctx: ServerRequestContext, params: types.CallToolRequestParams) -> CallToolResult:
56+
assert params.name == "block"
57+
handler_started.set()
58+
await anyio.Event().wait() # blocks until the courtesy cancellation interrupts it
59+
raise NotImplementedError # unreachable: the wait above never completes normally
60+
61+
server = Server("blocker", on_call_tool=call_tool)
62+
recording = RecordingTransport(InMemoryTransport(server))
63+
64+
async with Client(recording, mode="legacy") as client:
65+
with anyio.fail_after(5):
66+
async with anyio.create_task_group() as task_group: # pragma: no branch
67+
68+
async def call() -> None:
69+
await client.call_tool("block", {})
70+
raise NotImplementedError # unreachable: the surrounding scope is cancelled mid-flight
71+
72+
task_group.start_soon(call)
73+
await handler_started.wait()
74+
task_group.cancel_scope.cancel()
75+
76+
(call_request,) = [
77+
item.message
78+
for item in recording.sent
79+
if isinstance(item.message, JSONRPCRequest) and item.message.method == "tools/call"
80+
]
81+
(cancellation,) = [
82+
item.message
83+
for item in recording.sent
84+
if isinstance(item.message, JSONRPCNotification) and item.message.method == "notifications/cancelled"
85+
]
86+
assert cancellation.params == snapshot({"requestId": 2, "reason": "caller cancelled"})
87+
assert cancellation.params is not None and cancellation.params["requestId"] == call_request.id
88+
89+
4090
@requirement("protocol:cancel:in-flight")
4191
@requirement("protocol:cancel:handler-abort-propagates")
4292
async def test_cancellation_stops_in_flight_handler(connect: Connect) -> None:
@@ -87,6 +137,77 @@ async def call_and_capture_error() -> None:
87137
assert errors == snapshot([ErrorData(code=0, message="Request cancelled")])
88138

89139

140+
@requirement("protocol:cancel:in-flight")
141+
async def test_client_answers_a_cancelled_server_initiated_request_with_the_code_zero_error(connect: Connect) -> None:
142+
"""Cancelling a server-initiated request interrupts the client's callback, and the client
143+
answers with the code-0 error -- the client half of the divergence on this requirement (the
144+
spec says the receiver should not respond at all). The server cancels its own sampling
145+
request while still awaiting it, so the client's answer is observed as the awaited call's
146+
failure; the whole exchange sits under one fail_after, so a silent client fails the test
147+
instead of hanging it.
148+
"""
149+
callback_started = anyio.Event()
150+
callback_cancelled = anyio.Event()
151+
client_request_ids: list[types.RequestId] = []
152+
errors: list[ErrorData] = []
153+
154+
async def sampling_callback(
155+
context: ClientRequestContext, params: types.CreateMessageRequestParams
156+
) -> types.CreateMessageResult:
157+
client_request_ids.append(context.request_id)
158+
callback_started.set()
159+
try:
160+
await anyio.Event().wait() # blocks until the cancellation interrupts it
161+
except anyio.get_cancelled_exc_class():
162+
callback_cancelled.set()
163+
raise
164+
raise NotImplementedError # unreachable
165+
166+
async def list_tools(
167+
ctx: ServerRequestContext, params: types.PaginatedRequestParams | None
168+
) -> types.ListToolsResult:
169+
return types.ListToolsResult(tools=[types.Tool(name="canceller", input_schema={"type": "object"})])
170+
171+
async def call_tool(ctx: ServerRequestContext, params: types.CallToolRequestParams) -> CallToolResult:
172+
assert params.name == "canceller"
173+
request = types.CreateMessageRequest(
174+
params=types.CreateMessageRequestParams(
175+
messages=[types.SamplingMessage(role="user", content=TextContent(text="Say hello."))],
176+
max_tokens=8,
177+
)
178+
)
179+
with anyio.fail_after(5):
180+
async with anyio.create_task_group() as task_group:
181+
182+
async def sample_and_capture_error() -> None:
183+
with pytest.raises(MCPError) as exc_info:
184+
await ctx.session.send_request(request, types.CreateMessageResult)
185+
errors.append(exc_info.value.error)
186+
187+
task_group.start_soon(sample_and_capture_error)
188+
await callback_started.wait()
189+
await ctx.session.send_notification(
190+
types.CancelledNotification(
191+
params=types.CancelledNotificationParams(
192+
request_id=client_request_ids[0], reason="user aborted"
193+
)
194+
),
195+
related_request_id=ctx.request_id,
196+
)
197+
# The join above completes only when the client's answer arrives; the enclosing
198+
# fail_after turns a silent client into a TimeoutError -- a failed test, not a hang.
199+
await callback_cancelled.wait()
200+
return CallToolResult(content=[TextContent(text="cancelled")])
201+
202+
server = Server("canceller", on_list_tools=list_tools, on_call_tool=call_tool)
203+
204+
async with connect(server, sampling_callback=sampling_callback) as client:
205+
result = await client.call_tool("canceller", {})
206+
207+
assert result == snapshot(CallToolResult(content=[TextContent(text="cancelled")]))
208+
assert errors == snapshot([ErrorData(code=0, message="Request cancelled")])
209+
210+
90211
@requirement("protocol:cancel:no-further-notifications")
91212
async def test_no_notifications_for_a_request_arrive_after_its_cancellation(connect: Connect) -> None:
92213
"""After a request is cancelled, no further notifications for it reach the wire (spec-mandated).

0 commit comments

Comments
 (0)