Skip to content

Commit 19c9ce8

Browse files
committed
chore: iterate on readme, bringing in fuller examples and a features list
1 parent 22a0fee commit 19c9ce8

File tree

1 file changed

+49
-3
lines changed

1 file changed

+49
-3
lines changed

README.md

Lines changed: 49 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,29 @@
1+
<div align="center">
2+
<a href="https://www.speakeasy.com/" target="_blank">
3+
<img
4+
width="1500"
5+
height="500"
6+
alt="Speakeasy"
7+
src="https://github.com/user-attachments/assets/0e56055b-02a3-4476-9130-4be299e5a39c"
8+
/>
9+
</a>
10+
</div>
11+
112
# Speakeasy Docs MCP
213

314
A lightweight, domain-agnostic hybrid search engine for markdown corpora, exposed via the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP). While it can index and serve **any** markdown corpus, it is deeply optimized for serving SDK documentation to AI coding agents. **Beta.**
415

16+
## Features
17+
18+
- **Hybrid search** — full-text, phrase proximity, and vector similarity blended via Reciprocal Rank Fusion
19+
- **Distributed manifests** — per-directory `.docs-mcp.json` files configure chunking strategy, metadata, and taxonomy independently per subtree
20+
- **Faceted taxonomy** — metadata keys become enum-injected JSON Schema filters on the search tool
21+
- **Vector collapse** — deduplicates near-identical cross-language results at search time
22+
- **Incremental builds** — embedding cache fingerprints each chunk; only changed content is re-embedded
23+
- **Graceful degradation**
24+
- *Chunking* — chunk sizes adapt to the configured embedding provider's context window; falls back to conservative defaults when no provider is set
25+
- *Query* — if the embedding API errors at runtime (downtime, expired credits, network issues), the server falls back to FTS-only search with a one-time warning
26+
527
## How It Works
628

729
Docs MCP provides a local, in-memory search engine (powered by [LanceDB](https://lancedb.github.io/lancedb/)) that runs inside a Node.js MCP server. Three core optimizations make it effective for structured documentation:
@@ -209,14 +231,31 @@ The tools exposed to the agent are dynamically generated based on your `corpus_d
209231
## Quick Start
210232

211233
```dockerfile
234+
# --- build stage ---
235+
FROM node:22-slim AS build
236+
RUN npm install -g @speakeasy-api/docs-mcp-cli
237+
ARG DOCS_DIR=docs
238+
COPY ${DOCS_DIR} /corpus
239+
RUN --mount=type=secret,id=OPENAI_API_KEY \
240+
OPENAI_API_KEY=$(cat /run/secrets/OPENAI_API_KEY) \
241+
docs-mcp build --docs-dir /corpus --out /index --embedding-provider openai
242+
243+
# --- runtime stage ---
212244
FROM node:22-slim
213-
RUN npm install -g @speakeasy-api/docs-mcp-cli @speakeasy-api/docs-mcp-server
214-
COPY docs /corpus
215-
RUN docs-mcp build --docs-dir /corpus --out /index --embedding-provider hash
245+
RUN npm install -g @speakeasy-api/docs-mcp-server
246+
COPY --from=build /index /index
216247
EXPOSE 20310
217248
CMD ["docs-mcp-server", "--index-dir", "/index", "--transport", "http", "--port", "20310"]
218249
```
219250

251+
```bash
252+
docker build --secret id=OPENAI_API_KEY,env=OPENAI_API_KEY \
253+
--build-arg DOCS_DIR=./docs -t docs-mcp .
254+
docker run -p 20310:20310 -e OPENAI_API_KEY docs-mcp
255+
```
256+
257+
The build secret embeds the corpus; the runtime `-e OPENAI_API_KEY` lets the server embed search queries.
258+
220259
## Usage & Deployment
221260

222261
**1. Authoring (Local Dev)**
@@ -245,6 +284,13 @@ The `.lancedb` directory is packaged with the MCP server. FTS search is fully lo
245284
npx @speakeasy-api/docs-mcp-server --index-dir ./dist/.lancedb
246285
```
247286

287+
**4. Playground (Optional)**
288+
Explore the index interactively in a browser:
289+
```bash
290+
npx @speakeasy-api/docs-mcp-playground
291+
```
292+
Open `http://localhost:3001`. Requires a running HTTP server (step 3 with `--transport http`).
293+
248294
## Evaluation
249295

250296
Docs MCP includes a standalone evaluation harness for measuring search quality with transparent, repeatable benchmarks. See the [Evaluation Framework](docs/eval.md) for how to build an eval suite, run benchmarks across embedding providers, and interpret results.

0 commit comments

Comments
 (0)