This document defines the schema, composition rules, and precedence hierarchy for the .docs-mcp.json file. These manifests are distributed throughout the markdown corpus to dictate how files should be chunked and what metadata should be attached to them.
The manifest is a versioned JSON object that defines baseline rules for its directory, with an overrides array for exceptions.
{
"version": "1",
"metadata": {
"language": "typescript",
"scope": "sdk-specific"
},
"strategy": {
"chunk_by": "h2",
"max_chunk_size": 8000,
"min_chunk_size": 200
},
"mcpServerInstructions": "This server provides TypeScript SDK documentation for Acme Corp. Use the search tool to find API references, guides, and code examples.",
"overrides": [
{
"pattern": "models/**/*.md",
"strategy": {
"chunk_by": "file"
}
},
{
"pattern": "guides/advanced/*.md",
"metadata": {
"scope": "global-guide"
}
}
]
}version: (Required) Currently strictly"1". Ensures future parser compatibility.metadata: (Optional) The baseline metadata record applied to any markdown file in this directory tree.strategy: (Optional) The baselineChunkingStrategyapplied to any markdown file in this directory tree.chunk_by: (Required within strategy) The heading level at which to split:"h1","h2","h3", or"file"(no split).max_chunk_size: (Optional) Character limit. If a single DOM node exceeds this, the indexer applies a fallback split to prevent oversized chunks.min_chunk_size: (Optional) Character floor. Tiny trailing chunks below this threshold are merged into the preceding chunk.
mcpServerInstructions: (Optional) Custom MCP server instructions sent to clients during initialization. Helps coding agents understand what this server provides and how to use it effectively.overrides: (Optional) An array of objects mapping globpatterns to specificstrategyandmetadataoverrides.
To ensure predictable and fast builds, the chunking and metadata resolution follows strict precedence and matching rules.
When the indexer evaluates a specific markdown file, it resolves the chunking strategy and metadata from highest priority to lowest:
- YAML Frontmatter (Highest): If a file contains explicit
mcp_chunking_hintor metadata keys in its YAML frontmatter, these are merged with precedence over any manifest configurations for that specific file. Frontmatter keys win; manifest keys not present in frontmatter are preserved. - Manifest
overridesMatch: If the file matches a globpatternin theoverridesarray of the nearest manifest.- Merging: The override
metadatais merged with the rootmetadata(override keys win). The overridestrategyreplaces the rootstrategy.
- Merging: The override
- Manifest Baseline: The root
strategyandmetadatafields in the nearest manifest. - Global System Defaults (Lowest): (e.g.,
chunk_by: "h2").
Manifests do not merge across directories.
If /docs/.docs-mcp.json and /docs/sdks/typescript/.docs-mcp.json both exist, a file located at /docs/sdks/typescript/auth.md is governed exclusively by the TypeScript folder's manifest. The parent /docs manifest is completely ignored for that subtree. This prevents complex, hard-to-debug inheritance chains.
Within a single .docs-mcp.json, the overrides array is evaluated from top to bottom. If a file path matches multiple glob patterns, the last matching entry wins. This allows authors to define broad catch-all rules at the top and specific exceptions at the bottom.
Override pattern matching is evaluated against the file path relative to the directory containing that manifest.
- Written by: Authors manually, or automatically bootstrapped/repaired by the
@speakeasy-api/docs-mcp-cli(docs-mcp fixcommand). - Read by:
@speakeasy-api/docs-mcp-cli(specifically thecorpus-walker) during thevalidateandbuildcommands. - Never read by:
@speakeasy-api/docs-mcp-server. The runtime server only knows about the resultingChunkand its indexedmetadata.json.