You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a realistic 28.8MB multi-language SDK corpus (38 eval cases across 9 categories), benchmarked with [`docs-mcp-eval benchmark`](docs/eval.md):
61
+
On a realistic ~300-operation API with hand-written guides (~28.8MB corpus, 5 eval categories), benchmarked with [`docs-mcp-eval benchmark`](docs/eval.md):
62
62
63
63
### Summary
64
64
65
-
| Metric | none | openai/text-embedding-3-large |
66
-
| --- | ---: | ---: |
67
-
| MRR@5 | 0.1803 | 0.2320 |
68
-
| NDCG@5 | 0.2136 | 0.2657 |
69
-
| Facet Precision | 0.3158 | 0.3684 |
70
-
| Search p50 (ms) | 5.2 | 242.6 |
71
-
| Search p95 (ms) | 6.6 | 5914.1 |
72
-
| Build Time (ms) | 6989 | 20448 |
73
-
| Peak RSS (MB) | 247.6 | 313.6 |
74
-
| Index Size (corpus 28.8MB) | 104.9MB | 356.9MB |
- FTS-only search: 5ms p50 latency, zero embedding cost
80
+
> MRR@5 (Mean Reciprocal Rank at 5) measures how high the first relevant result appears in the top 5. 1.0 = always ranked first; 0.0 = never appears in top 5.
We recommend starting with FTS-only search. While embeddings improve relevance for conceptual and paraphrased queries, they also introduce ~50x query latency and substantial build overhead. For agents that iterate through multiple searches, the faster cycle time of pure FTS has anecdotally proven more valuable than the per-query relevance lift — particularly with modern models capable of query refinement.
0 commit comments