|
| 1 | +<div align="center"> |
| 2 | + <a href="https://www.speakeasy.com/" target="_blank"> |
| 3 | + <img |
| 4 | + width="1500" |
| 5 | + height="500" |
| 6 | + alt="Speakeasy" |
| 7 | + src="https://github.com/user-attachments/assets/0e56055b-02a3-4476-9130-4be299e5a39c" |
| 8 | + /> |
| 9 | + </a> |
| 10 | +</div> |
| 11 | + |
1 | 12 | # Speakeasy Docs MCP |
2 | 13 |
|
3 | 14 | A lightweight, domain-agnostic hybrid search engine for markdown corpora, exposed via the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP). While it can index and serve **any** markdown corpus, it is deeply optimized for serving SDK documentation to AI coding agents. **Beta.** |
4 | 15 |
|
| 16 | +## Features |
| 17 | + |
| 18 | +- **Hybrid search** — full-text, phrase proximity, and vector similarity blended via Reciprocal Rank Fusion |
| 19 | +- **Distributed manifests** — per-directory `.docs-mcp.json` files configure chunking strategy, metadata, and taxonomy independently per subtree |
| 20 | +- **Faceted taxonomy** — metadata keys become enum-injected JSON Schema filters on the search tool |
| 21 | +- **Vector collapse** — deduplicates near-identical cross-language results at search time |
| 22 | +- **Incremental builds** — embedding cache fingerprints each chunk; only changed content is re-embedded |
| 23 | +- **Graceful degradation** |
| 24 | + - *Chunking* — chunk sizes adapt to the configured embedding provider's context window; falls back to conservative defaults when no provider is set |
| 25 | + - *Query* — if the embedding API errors at runtime (downtime, expired credits, network issues), the server falls back to FTS-only search with a one-time warning |
| 26 | + |
5 | 27 | ## How It Works |
6 | 28 |
|
7 | 29 | Docs MCP provides a local, in-memory search engine (powered by [LanceDB](https://lancedb.github.io/lancedb/)) that runs inside a Node.js MCP server. Three core optimizations make it effective for structured documentation: |
@@ -209,14 +231,31 @@ The tools exposed to the agent are dynamically generated based on your `corpus_d |
209 | 231 | ## Quick Start |
210 | 232 |
|
211 | 233 | ```dockerfile |
| 234 | +# --- build stage --- |
| 235 | +FROM node:22-slim AS build |
| 236 | +RUN npm install -g @speakeasy-api/docs-mcp-cli |
| 237 | +ARG DOCS_DIR=docs |
| 238 | +COPY ${DOCS_DIR} /corpus |
| 239 | +RUN --mount=type=secret,id=OPENAI_API_KEY \ |
| 240 | + OPENAI_API_KEY=$(cat /run/secrets/OPENAI_API_KEY) \ |
| 241 | + docs-mcp build --docs-dir /corpus --out /index --embedding-provider openai |
| 242 | + |
| 243 | +# --- runtime stage --- |
212 | 244 | FROM node:22-slim |
213 | | -RUN npm install -g @speakeasy-api/docs-mcp-cli @speakeasy-api/docs-mcp-server |
214 | | -COPY docs /corpus |
215 | | -RUN docs-mcp build --docs-dir /corpus --out /index --embedding-provider hash |
| 245 | +RUN npm install -g @speakeasy-api/docs-mcp-server |
| 246 | +COPY --from=build /index /index |
216 | 247 | EXPOSE 20310 |
217 | 248 | CMD ["docs-mcp-server", "--index-dir", "/index", "--transport", "http", "--port", "20310"] |
218 | 249 | ``` |
219 | 250 |
|
| 251 | +```bash |
| 252 | +docker build --secret id=OPENAI_API_KEY,env=OPENAI_API_KEY \ |
| 253 | + --build-arg DOCS_DIR=./docs -t docs-mcp . |
| 254 | +docker run -p 20310:20310 -e OPENAI_API_KEY docs-mcp |
| 255 | +``` |
| 256 | + |
| 257 | +The build secret embeds the corpus; the runtime `-e OPENAI_API_KEY` lets the server embed search queries. |
| 258 | + |
220 | 259 | ## Usage & Deployment |
221 | 260 |
|
222 | 261 | **1. Authoring (Local Dev)** |
@@ -245,6 +284,13 @@ The `.lancedb` directory is packaged with the MCP server. FTS search is fully lo |
245 | 284 | npx @speakeasy-api/docs-mcp-server --index-dir ./dist/.lancedb |
246 | 285 | ``` |
247 | 286 |
|
| 287 | +**4. Playground (Optional)** |
| 288 | +Explore the index interactively in a browser: |
| 289 | +```bash |
| 290 | +npx @speakeasy-api/docs-mcp-playground |
| 291 | +``` |
| 292 | +Open `http://localhost:3001`. Requires a running HTTP server (step 3 with `--transport http`). |
| 293 | + |
248 | 294 | ## Evaluation |
249 | 295 |
|
250 | 296 | Docs MCP includes a standalone evaluation harness for measuring search quality with transparent, repeatable benchmarks. See the [Evaluation Framework](docs/eval.md) for how to build an eval suite, run benchmarks across embedding providers, and interpret results. |
|
0 commit comments