Skip to content
This repository was archived by the owner on Nov 23, 2025. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions wg/validation [DRAFT]/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
[
**_DISCLAIMER:_** **This is not (yet) an actual Working Group, and has not gone through all of the preparatory steps. It was recommended to me by a core contributor that I open this up for discussion by opening a draft. The intention of this document is to clarify what a Validation Working Group would be like, what purpose it would serve, what goals it would have, and so on. All of this is subject to change if and when a group is formed. Obviously I have used the [Hosting WG](../hosting/README.md) README as a template.**
]

This working group is hosted by @TBD

---

# **MCP Validation Working Group (WG) – Vision & Roadmap**

This document is intended as a preliminary proposal and does not represent an official statement or policy of any organization.

**Prepared by:** [Jesse Rappaport] (@hesreallyhim)
**Date:** June 13, 2025

The current [Roadmap](https://modelcontextprotocol.io/development/roadmap#validation) published on the [MCP website](https://modelcontextprotocol.io/) (albeit non-committal), states:

```
Validation

To foster a robust developer ecosystem, we plan to invest in:

* Reference Client Implementations: demonstrating protocol features with high-quality AI applications
* Compliance Test Suites: automated verification that clients, servers, and SDKs properly implement the specification

These tools will help developers confidently implement MCP while ensuring consistent behavior across the ecosystem.
```

MCP, insofar as it is a protocol, defines a set of "rules" or constraints on how two parties, an MCP client and an MCP server, must, should, or may interact. The "prime directive" of these rules is to facilitate _interoperation_. By declaring a common set of conventions that parties must follow, this enables each party to act independently of the other, and build applications, software, and other advanced technology, without the need to coordinate ahead of time on every implementation detail, because compliance with the protocol means that certain things can be taken for granted. Ultimately, although it is a set of _constraints_, this creates a space that _empowers_ technologists to innovate and build technology more rapidly, and ultimately enables more freedom for the evolving A.I. ecosystem. In some ways, one could argue that regardless of whether or not MCP is a "good" protocol, in its current shape (which is not to imply that it isn't), setting a foundation where people can express their interests and collaboratively define what this emergent technology should look like, is intrinsicably beneficial.

### **Background**

The proliferation of MCP implementations - servers, clients, server/client generators, and SDKs - has grown extremely rapidly since it was first launched. However, there is a dearth of resourcese available for developers to use to validate whether their implementation in fact complies with the Protocol or not. Although some open-source and proprietary tools/services exist, these tools themselves have not, to my knowledge, been vetted by the MCP standards body. Many of the existing reference implementations have been archived. [FastMCP](https://github.com/jlowin/fastmcp), a library that is incorporated into the official Python SDK (in its v1), has since seemed to have "spun off" from the MCP SDK, is now in v2, and effectively brands itself as "the actively maintained version" of the Python SDK. Without some authoritative "stamp of approval" from the MCP group, the Protocol will inevitably lose its status as a standard, and the ecosystem will evolve in a more fractured way, hence impeding interoperation.

To that end, I believe that the goal stated in the (provisional) roadmap of developing a set of resources for validating compliance with MCP, is of critical importance. The Protocol describes a range of incredible capacities that could empower the growth and adoption of A.I. applications in a way that is both innovative and safe (secure). Yet, I think to many developers, MCP is a fancy way of implementing "LLM tools via an API" - which is a pretty nice thing to have, but is only one part of what this Protocol encapsulates.

Reference implementations that showcase the full scope of capabilities that MCP describes would go a long way in expanding people's understanding of it. However, I believe that developing the basic tools for validating compliance are the most important step forward to keeping this Protocol alive and making it accessible to the community at large. We must always be clear that the goal is to promote ineroperation, and not to be an authority for the sake of name recognition. Interoperation means a common language, and common conventions - when it comes to a technical Protocol, these conventions should be clear enough that, in general, they can be validated with automatic tooling.

### **Scope of Work**

The following is a list of resources that I think can be built, tested, and deployed relatively quickly, and would provide a lot of value. The categories would divide into client validators, server validators, SDK validators, and (potentially) MCP-generator validators. There are various ways to tackle these problems:

- Static analysis libraries
- Fully validated official SDK's
- Solutions that incorporate "LLM-as-judge" type testing
- A publicly accessible server that client implementers can communicate with to test out their implementations, and perhaps a similar resource for client implementers

For a concrete proposal, given the current state of the spec, I think a great approach would be a scenario testing framework, where messaging flows/scenarios are defined, enabling implementers to "plug in" their implementation client/server to different scenarios (e.g., an "Initialization Scenario", a "Tools Listing Scenario", etc.), and would effectively be an endpoint where the implementor engages in some communication flow, and we have defined a set of Scenarios that would inform them whether the messages they are sending, and the way they are responding, conforms to the Protocol or not.

Postman, for instance, is already ahead of the game, in this respect. They have dedicated [resources](https://learning.postman.com/docs/postman-ai-agent-builder/mcp-requests/interact/) for implementing MCP, generating MCPs, testing APIs, and a system they call "Flows" for defining Scenarios for API interactions. Although I believe this is a paid service, there are open-source frameworks available that offer comparable functionality. If we adopted one of these frameworks (I haven't done any comparison shopping), we could incrementally build a suite of scenario tests that developers could stand up on their own machines locally, or in some cloud instance, and test their implementations. More abstract solutions could be built on top of that for MCP-generators or for SDK's. And the test suite could be built out incrementally, allowing us to start rolling out a library that hopefully could be viewed as the "definitive" evaluation framework for MCP compliance. It could even validate whether a server/client who announces that they support a certain capability really does support that capability. Then, if I visit a registry, instead of seeing some letter grades, that I don't really understand what they're based on, we have a common set of metrics that are actually meaningful and documented.

Without these tools, MCP will become just whatever Postman, OpenAI, Anthropic, FastMCP, etc., decide it is. And maybe if that's what's best for the growth of this technology in a democratic and safe way, then maybe that's for the best. But I still think the MCP standards body could provide a better space for developing this technology - and to that end, I think building these tools is very urgent.

I'll end here, as I think I've made the case for why validation tools are worth actually investing resources into, and I've laid out a few concrete proposals, as well as one specific place that I think would be a great starting point. If anyone is persuaded by the first part, but has different or better ideas regarding implementation, I hope some of you will join me in this effort. The details of this document are entirely preliminary.

Since this is only a draft of a draft of a Working Group statement, I'll not go into more detail. For those who are interested, or at least for the sake of posterity, I'll also try to put together a supplementary document regarding what I see as some fundamental flaws in the existing Protocol (in regards to its status as a specification, not its technical merits), in particular with respect to the question of _source of truth_, but I have tried to raise this in a number of places, and so far I haven't received much engagement.
39 changes: 39 additions & 0 deletions wg/validation [DRAFT]/addendum.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Concrete Issues (and some proposed solutions)

The following is a list of some of the linguistic issues that I have identified in the Specification, divided according to "T-shirt size".

## LARGE:

- The Specification lacks a Single Source of Truth
The formal Specification is comprised of the Specification [documents](https://modelcontextprotocol.io/specification/2025-06-18), and the TypeScript/JSON Schema. The latter defines the interfaces and data structures formally recognized by the Protocol messaging system, while the latter describes the Protocol more broadly. The problem is that the Schema is full of comments (JSDoc-style comments, inline comments, etc.), which are written in formal English language (they use the formal language of MUST/SHOULD/MAY), which overlap with the content of the Spec docs. Since both sources are classified as authoritative, the Protocol is indeterminate if they ever happen to contradict each other.
- The purpose of the Schema is unclear
If the Schema were _merely_ a formal, strictly typed enumeration of all classes and data structures recognized by the Specification, it might serve as a particularly rigorous and computationally accessible sub-component of the Specification as a whole. But the decision to use English, formal English, _and_ TypeScript as the languages for expressing the Specification is not made clear. Why does the Schema contain any English? (I assume the comments and JSDoc strings are also authoritative.) Are the comments supplementary or do they contain information not contained in the Schema? If it's the latter, why are they not written in the English documentation part? Requirements are often repeated, with slightly different wording in both the Schema and the Spec docs - this is confusing for readers; difficult for authors, who must ensure consistency across two document sources; and opens the door for inconsistency, which is "fatal" for a formal technical Spec.

## MEDIUM:

- The Specification is not entirely compliant with JSON-RPC 2.0 - which, in itself, is of no importance, except that the Spec states that JSON-RPC is part of the base protocol.
One minor deviation is explicitly noted (regarding the use of `null` as an `id`). I have identified two other places where I believe the Specification strays from JSON-RPC - in one case, deliberately (I think), but without proper documentation to the reader (no support for batching); in the other case, probably an oversight (`SubscribeRequest` and `UnsubscribeRequest` are Requests which lack a corresponding `(Un)SubscribeResult`).
- Some requirements of the Specification are tacit, implied, or must be inferred, perhaps via a holistic reading of the documents as a whole.
For instance: The "Capabilities" object is structured as follows - for each potential non-experimental capability, if the client/server supports it, they indicate this by including that property in the Capabilities object, possibly as a key with an empty dictionary as a value. Sub-capabilities are then declared as `sub_capability: true` inside that specific capability's value. Each of these correspond to messages, or methods, that the server or client support. But the object is not an exhaustive enumeration. For instance, the `tools/list` method is not declared - presumably it is to be inferred by the presence of the `tools` property - but is this actually made clear anywhere? (Is it even the case? From my reading, this cannot be inferred without interpretation.) Why not simply eliminate the need for interpretation - for the reader to browse through the whole Specification, in order to reason about the Capabilities object, when a complete enumeration would be a simple remedy?
- The Specification documents frequently use language that leaves room for interpretation, when explicit, legalistic language would be more appropriate. For instance, the Spec talks often about what capabilities clients and servers "support" - but the notion of "support" is never formally defined or clarfied. For instance - what does it mean to say that a server "supports Resources"? From a technical point of view, is it equivalent to saying that the server will respond to `resources/list` Requests by listing their Resources (all, some, a few)? If so, why not simply state this explicitly? Another issue is the use of descriptive language where prescriptive or imperative language is appropriate. For instance (example chosen at random): "To discover available resources, clients send a `resources/list` request." How is this to be read? Clients MAY send such a request? Clients MUST send such a request if they wish to discover available resources? Compare these expressions and decide which is clearer from a formal point of view:

1. To discover available resources, clients send a `resources/list` request. This operation supports pagination.
2. To discover available resources, clients may send a `resources/list` request. Clients SHOULD only send this request if the server has announced this capability. Servers who support resources MUST respond to these requests with a full list of resources available to that client. This response MUST support pagination.

- The use of the term "result" is confusing. A Result is sometimes used in place of "a successful response" but `result` is also a _property_ of a successful response. Note the two ways it is used in this statement, which speaks about "successful results" (which probably means "successful responses"), and then also as a property that must be set:
> Responses are further sub-categorized as either successful results or errors. Either a result or an error MUST be set. A response MUST NOT set both.

## Small

- The Spec mentions the term "sub-resource" but the term is never defined.
-

# Recommendations

- Strictly define and enforce Separation of Concerns - if part of the Specification must be expressed in TypeScript, it must be clear _why_. Comments should be entirely removed from the Schema. If TypeScript is not a sufficiently rich language to express the Schema, it should not be used as an authoritative source. Move all the definitions into the Spec documents, where English commentary is appropriate - use programming language conventions to express certain aspects when needed, but don't require readers to cross-reference two sets of documents, one in English, and one in a programming language with extensive comments. Why make the reader conversant in the syntax of TypeScript interfaces, when the same things can be expressed in English, especially if the TypeScript Schema is not sufficiently clear without additional comments?

- Remove potentially ambiguous language from the Specification, and use formal language in its place. Leverage formal, functional definitions instead when possible. (Example: JSON-RPC declares that the lack of an `id` is _constitutive_ of a Notification - it states a clear formal criterion, before offering some comments about what the larger purpose is of Notifications.) Eliminate the use of terms like "supports" unless they are formally defined. Replace descriptive language (which is appropriate for a user guide) with prescriptive language. Do not rely on any tacit knowledge or "commonsense assumptions", especially when dealing with data structures like the Capabilities object. Ideally, do not require the reader to make subtle inferences based on statements and definitions that are distributed across many documents/pages. Proactively look for opportunities for (mis-)interpretation and just eliminate them with explicit language. Prefer redundancy in some case, for simple prescriptions, or simply refer the reader back to the general requirements, if they are stated formally.

- Make a clearer delineation between the Specification and the User Guide - establish the Specification as a formal, technical document, and use appropriate language, even if it involves dry or "legalistic" verbiage. Then the Spec is the Spec and the guides are more user-friendly presentations of the formal content of the Spec. If the Spec is complex and nuanced, do not rely heavily on broad "principles" such as those stated in the Overview - these could probably be interpreted in a way that resolves all ambiguity that I have identified in this document - but the reader should not have to engage in complex hermeneutics in order to understand the Spec.

- Establish up front what constraints JSON-RPC imposes on the Specification. (Requests MUST include an `id`, etc. - quite brief, really). This would eliminate the need to re-state these things in the Specification when describing various message types and expected behavior. If the Spec says that it uses JSON-RPC 2.0 as the base messaging protocol, then just state up front (as a convenience) what this means for every Request, Response, Notification, etc. - or simply refer the reader to the relevant definition and omit it from the Specification. (This is the case with respect to MUST/SHOULD/MAY, which are technical terms with nuanced definitions that are used but not _redeclared_ in the Spec.) There is no need to re-declare constraints that are already established in the JSON-RPC spec (although doing so once may help some readers), and doing so may lead to mis-interpretation or mis-alignment.