-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Title: Output Schema Support with Semantic Metadata Enhancement
Authors: Pratyush Nag ([email protected])
1. Abstract
This SEP proposes adding structured output schemas to the MCP Tool definition, which will be accessible via client.list_tools()
and similar endpoints. Crucially, the proposal includes a Semantic Metadata Enhancement pipeline that automatically enriches the standard JSON Schema with semantic types (e.g., email
, url
, currency
), enabling client applications to provide intelligent rendering and improved developer experience. This resolves the current limitation where tool outputs lack structured, machine-readable metadata.
2. Motivation
The current Model Context Protocol (MCP) specification is inadequate to support the development of truly interoperable, intelligent, and user-friendly client applications due to a fundamental limitation: The tool contract is incomplete, lacking a structured, machine-readable specification for the tool's output data.
While MCP successfully standardizes input via the input_schema
and abstracts tool invocation for LLMs, it treats the tool's return value as an opaque payload—often just a string or a minimally structured JSON object passed back to the client. This architectural decision creates critical gaps that severely limit the utility and adoption of MCP across complex environments:
Inadequacy of the Current Protocol
Impaired LLM Reasoning and Multi-Tool Orchestration:
The current specification fails to provide LLMs with a complete data contract. While an LLM knows what to send to a tool (via input_schema
), it lacks pre-knowledge of the precise, structured data it will receive. This forces the LLM to rely on runtime observation or imprecise natural language descriptions to integrate the result, leading to:
- Increased Hallucination Risk: LLMs must guess the output structure when chaining calls, increasing the likelihood of making incorrect assumptions about data fields or types.
- Inefficient Multi-Agent Workflows: Managing complex, nested results from multiple tools is difficult without a unified, structured schema, hindering the scalability of sophisticated agentic systems.
Poor Cross-Platform Interoperability and UI/UX:
The current setup makes successful, dynamic rendering of MCP results across diverse client platforms (e.g., web frontends, mobile apps, desktop IDEs) a manual, complex burden. Since the output schema and, critically, the semantic meaning of the data fields are missing:
- Developer Workarounds: Developers must implement complex, brittle workarounds (e.g., hard-coded logic, test calls, or manual code inspection) to infer the structure and meaning of the data (e.g., is this string a URL, an email, or a plain file path?).
- Impossibility of Smart UI/UX: Client applications cannot dynamically apply intelligent rendering or formatting. A raw string representing a currency cannot be automatically displayed with the correct currency symbol and decimal formatting, nor can a URL string be rendered as an interactive, clickable link. This prevents Close MCP Integration with Frontends and results in a poor user experience.
The proposed feature—adding a formal Output Schema with Semantic Metadata Enhancement—directly addresses these inadequacies by completing the tool contract, thus promoting true interoperability, enabling smart UI design, and enhancing the reliability of complex LLM reasoning.
This Specification section details the required changes to the Model Context Protocol to support structured, semantically-enhanced output contracts. This design is fully backward-compatible, ensuring zero migration effort for existing MCP clients.
3. Specification
The technical specification introduces a new optional field, output_schema
, to the Tool
definition within the Model Context Protocol, and defines a standard for semantic metadata enhancement within that schema.
(Associated Implementation PR for Review: modelcontextprotocol/python-sdk#757)
3.1. Protocol Extension: The `Tool` Object
The `Tool` object returned by the server (via methods like `client.list_tools()` and `client.get_tool()`) SHALL be extended to include a new, optional top-level property: `output_schema`.
Property | Type | Description | Required | Backward Compatibility |
---|---|---|---|---|
`output_schema` | JSON Schema Object | A JSON Schema object defining the structure and data types of the successful response payload returned by the tool. This schema MUST be generated based on the tool's defined return type and enriched with semantic metadata. | No | Fully non-breaking. |
3.2. Output Schema Structure and Semantics
The `output_schema` SHALL adhere to the JSON Schema standard (Draft 7 or later). All properties within this schema MUST be enriched with optional semantic metadata fields embedded directly in the property object.
A. Semantic Metadata Fields
The following optional fields are introduced within the properties object of the JSON Schema:
Field Name | Type | Description |
---|---|---|
`semantic_type` | String (Enum) | Defines the high-level semantic meaning of the field (e.g., `url`, `currency`). |
`datetime_type` | String (Enum) | (Conditional) Used only when `semantic_type` is `datetime`. Specifies the sub-format (`date_only`, `time_only`, `datetime`). |
`media_format` | String (Enum) | (Conditional) Used only when `semantic_type` is `audio`, `video`, or `image`. Specifies the file format (`audio_file`, `video_file`, `image_file`). |
B. Supported `semantic_type` Enumeration
The server MUST support automatic detection and inclusion of the following `semantic_type` values:
Category | `semantic_type` Value | JSON Schema Type | Purpose |
---|---|---|---|
Communication | `email` | `string` | Smart UI: Show mail icon, enable click-to-email. |
`url` | `string` | Smart UI: Render as clickable hyperlink. | |
DateTime | `datetime` | `string` | Smart UI: Show date picker, ensure proper formatting. |
Media | `audio`, `video`, `image` | `string` | Smart UI: Render appropriate media player or thumbnail. |
System | `file_path`, `identifier`, `status`, `color` | Varies | Smart UI: Custom rendering (e.g., colored indicators for status). |
Numeric | `currency`, `percentage` | `number` or `integer` | Smart UI: Apply locale-specific currency or percentage formatting. |
C. Example Transformation
A client application that receives the following schema can interpret the field's meaning beyond just being a `number`:
Enhanced JSON Schema Property Example:
// Tool output field named 'total_balance'
{
"type": "number",
"title": "Account Balance",
"semantic_type": "currency"
}
4. Rationale
The rationale for the design decisions related to the Output Schema and Semantic Metadata has been updated to reflect the critical need for a standardized, explicit data contract over unreliable auto-detection heuristics.
4.1. Design Decisions and Justifications (Updated)
Decision | Justification |
---|---|
Embedding Metadata in JSON Schema Properties | The semantic metadata (semantic_type , datetime_type , etc.) is embedded directly within the existing JSON Schema property object. Justification: This approach maintains full JSON Schema compliance, treating the semantic information as optional, descriptive keywords. It avoids creating a parallel, non-standard metadata structure, making it easy for existing schema validators to ignore the new fields while providing rich context for new clients. |
Automatic Schema Generation (Zero-Config) | The structural schema (type , properties ) is generated automatically during tool registration using reflection (e.g., Pydantic). Justification: Ensures the highest quality and consistency of the structural data contract with minimal developer effort for defining structure. |
Explicit Semantic Typing by Implementor (New) | Tool implementors are responsible for explicitly defining the semantic_type in their return type definition (e.g., via Pydantic field decorators or equivalent language constructs). Limited pattern-matching heuristics are provided only as a fallback. Justification: Reliance on field-name heuristics for complex semantic types leads to an unreliable, unending set of heuristic rules that cannot fully capture developer intent (e.g., is "price" a currency or a count ?). Requiring the tool author to specify the semantic type guarantees accuracy, consistency, and a standardized mechanism for client application rendering across the ecosystem. |
Automatic Removal of `required` Field in Output | The required array is automatically stripped from the top-level output schema. Justification: The output of an executed tool is always a complete object defined by its return type. The required field, designed for validation of input data, is semantically incorrect and redundant for output data, which is guaranteed to be present if the tool succeeds. |
Type-Specific Validation | Numeric semantic types (currency , percentage ) are only applied if the base JSON Schema type is number or integer . Justification: Prevents nonsensical or incorrect labeling (e.g., labeling a string "100" as a currency type without a clear numeric context), maintaining data integrity and client-side validation logic. |
4.2. Alternatives Considered
Alternative | Description | Reason for Rejection |
---|---|---|
Separate Top-Level Metadata Object | Defining a new top-level object ("output_metadata": {...} ) alongside the output_schema . |
Violates Cohesion. It fragments the schema data, forcing client logic to cross-reference two separate structures for one field. It increases parsing overhead and moves away from the principle of self-contained schema descriptions. |
Relying on JSON Schema `format` Keyword | Using the existing JSON Schema format keyword (e.g., format: 'email' ). |
Insufficient Scope. The existing format enumeration is too limited (only covers a few types like date-time , email , uri ). It cannot accommodate critical MCP-specific needs like currency , audio , video , or complex sub-types like date_only , which are essential for rich UI rendering. |
Heavy Reliance on Heuristics (Original Design) | Maximizing case-insensitive pattern matching on field names to auto-detect semantics. | Unacceptable Reliability Risk. This was determined to be fundamentally flawed as heuristics are fragile and lead to an unmanageable, non-standardized definition of semantic types. The only sustainable path is to formalize the semantic contract through explicit implementor input. |
4.3. Objections and Concerns Addressed
Concern | Resolution / Rationale |
---|---|
Increased Discovery Latency | Negligible Impact. The schema generation and enhancement process runs only once during the tool registration/server startup phase. The fully enhanced schema is then cached, resulting in minimal overhead during client list_tools() or get_tool() calls. |
Developer Overhead for Semantic Typing | Necessary Standard. While requiring explicit decoration adds a minimal step for the tool implementor, it is the only way to ensure accuracy and consistency. This small cost is justified by the massive, immediate benefit it provides to all downstream clients (UIs and LLMs), enabling dynamic rendering and reliable multi-tool orchestration. |
JSON Schema Over-Complication | Non-Breaking and Optional. The new semantic fields are optional additions to the standard property definition. Existing clients or LLM prompts using older protocol versions can safely ignore these fields while still operating correctly. |
5. Backward Compatibility
This SEP introduces new functionality in a manner that is fully backward-compatible with all existing MCP implementations, requiring no mandatory migration for existing clients or servers.
5.1. Non-Breaking Status
This proposal introduces a Standards Track SEP that is entirely non-breaking:
- Zero Forced Migration: All new fields (
output_schema
,semantic_type
, etc.) are optional additions to the existing Tool definition object. Older MCP clients will simply ignore the new fields, and their existing workflows (tool discovery, synchronous calling) will remain unchanged and fully functional. - JSON Schema Integrity: The semantic metadata is embedded as optional custom keywords within the JSON Schema. Older schema parsers will safely ignore these unknown properties, maintaining strict compliance and avoiding validation failures.
5.2. Incompatibility Management and Adoption Costs
While the feature is non-breaking, full utilization requires coordinated updates, and the design introduces a new category of developer overhead in exchange for greater accuracy.
A. Feature Adoption Limitation
MCP servers and clients using the old protocol version will not be able to enjoy the new features. To realize the benefits (Structured Output, Smart UI, Reliable LLM Reasoning), both the MCP server (to generate the enhanced schema) and the MCP client (to read and act upon the output_schema
and semantic fields) must be updated to support this SEP's protocol version. This is an inherent limitation of adding new features to a protocol.
B. Semantic Typing Developer Overhead
The decision to rely on explicit semantic typing by the tool author (rather than unreliable heuristics) introduces a minor, optional developer overhead:
- Required Action: Tool implementors who wish for their tools to expose the semantic metadata (e.g., to enable "Smart UI Rendering" on the client) must now explicitly define the
semantic_type
in their tool's return type definition (e.g., via Pydantic decorators or similar language constructs). - Justification: This overhead is deemed acceptable and necessary because it guarantees accuracy and consistency. The small cost of explicit declaration ensures the massive benefit of reliable, platform-agnostic rendering for all downstream clients. Tools that do not require this feature will still benefit from automatic structural schema generation (
output_schema
will contain structure, but no semantic metadata) with zero overhead.