Local RAG MCP server for Claude Code. Hybrid search (semantic + BM25) over a personal document knowledge base using ChromaDB and Ollama embeddings.
- Python 3.11, 3.12, or 3.13. Python 3.14 and later are not supported due to unresolved ChromaDB compatibility issues.
- Ollama with the
nomic-embed-textmodel. - Claude Code.
brew install python@3.13 ollama
ollama serve &
ollama pull nomic-embed-text
git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtsudo apt install python3.13 python3.13-venv
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull nomic-embed-text
git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtUse WSL 2 and follow the Linux instructions above.
For native Windows, run install.ps1:
git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
.\install.ps1Add the server to ~/.claude.json. Replace /path/to/knowledge-rag with the actual path.
macOS / Linux:
{
"mcpServers": {
"knowledge-rag": {
"type": "stdio",
"command": "/path/to/knowledge-rag/venv/bin/python3",
"args": ["-m", "mcp_server.server"],
"env": {
"PYTHONUNBUFFERED": "1",
"PYTHONPATH": "/path/to/knowledge-rag",
"ANONYMIZED_TELEMETRY": "False"
}
}
}
}Windows (native):
{
"mcpServers": {
"knowledge-rag": {
"type": "stdio",
"command": "cmd",
"args": ["/c", "cd /d C:\\path\\to\\knowledge-rag && .\\venv\\Scripts\\python.exe -m mcp_server.server"],
"env": {
"PYTHONUNBUFFERED": "1",
"ANONYMIZED_TELEMETRY": "False"
}
}
}
}Environment variables:
| Variable | Purpose |
|---|---|
PYTHONUNBUFFERED |
Disables stdout buffering. Required -- without it, JSON-RPC messages may not flush in time. |
PYTHONPATH |
Module search path. Required on macOS/Linux when using direct venv Python without cd. |
ANONYMIZED_TELEMETRY |
Disables ChromaDB telemetry. Optional. |
Restart Claude Code after editing the configuration.
Place documents in documents/, organized by category subdirectories. Each subdirectory name becomes a category. New categories are created automatically.
documents/
├── laravel/
│ └── eloquent-tips.md
├── docker/
│ └── compose-patterns.md
├── security/
│ ├── redteam/
│ └── blueteam/
└── general/
└── notes.txt
Supported formats: .md, .txt, .pdf, .py, .json.
Documents are indexed automatically on server startup when the index is empty. Use reindex_documents to rebuild.
| Tool | Description |
|---|---|
search_knowledge |
Hybrid semantic + BM25 search |
get_document |
Retrieve full document content |
save_document |
Save a new document and index it |
reindex_documents |
Rebuild the search index |
list_categories |
List categories with document counts |
list_documents |
List indexed documents |
get_index_stats |
Index statistics |
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | required | Search query text |
max_results |
int | 5 | Maximum results to return (1--20) |
category |
string | null | Category filter |
hybrid_alpha |
float | 0.5 | Search balance: 0.0 = keyword only, 1.0 = semantic only |
| Parameter | Type | Description |
|---|---|---|
filepath |
string | Path to the document file |
| Parameter | Type | Default | Description |
|---|---|---|---|
title |
string | required | Document title (used as filename) |
content |
string | required | Document content in markdown format |
category |
string | "general" | Category subdirectory; new categories are auto-created |
| Parameter | Type | Default | Description |
|---|---|---|---|
force |
bool | false | Clear existing index and rebuild from scratch |
No parameters.
| Parameter | Type | Description |
|---|---|---|
category |
string | Optional category filter |
No parameters.
| hybrid_alpha | Behavior | Use case |
|---|---|---|
| 0.0 | Pure BM25 keyword search | Exact terms, CVE IDs, tool names |
| 0.3 | Keyword-heavy hybrid | Technical queries with specific terms |
| 0.5 | Balanced (default) | General queries |
| 0.7 | Semantic-heavy hybrid | Conceptual queries, related topics |
| 1.0 | Pure semantic search | "How to..." questions, understanding intent |
Keyword routing runs before search. When query terms match configured keyword routes (word-boundary regex matching), results are filtered to the matching category. When multiple keywords match different categories, each category is scored by match count and the highest-scoring category wins.
The search pipeline has four stages. First, keyword routing checks the query against configured routes using word-boundary regex. If a route matches, search is scoped to that category. Single-word routes use \b boundaries to prevent false positives (e.g., "api" does not match "RAPID"). Multi-word phrases use exact substring matching.
Second, ChromaDB performs vector similarity search using Ollama nomic-embed-text embeddings (768 dimensions). Third, the BM25 index performs exact term matching via the rank-bm25 library. The BM25 index is built lazily from ChromaDB data on the first query.
Fourth, Reciprocal Rank Fusion (RRF) with k=60 combines both rankings. Each result receives a weighted score: hybrid_alpha * 1/(k + semantic_rank) + (1 - hybrid_alpha) * 1/(k + bm25_rank). Results found by both methods are marked "hybrid" in output. Results from only one method are marked "semantic" or "keyword".
Documents are chunked at 1000 characters with 200-character overlap, breaking at paragraph, sentence, or word boundaries. Embeddings are generated in parallel using a ThreadPoolExecutor with 4 workers.
Key settings in mcp_server/config.py:
| Setting | Default | Description |
|---|---|---|
chunk_size |
1000 | Characters per chunk |
chunk_overlap |
200 | Overlap between consecutive chunks |
ollama_model |
nomic-embed-text | Ollama embedding model name |
ollama_base_url |
http://localhost:11434 | Ollama API endpoint |
collection_name |
knowledge_base | ChromaDB collection name |
default_results |
5 | Default search result count |
max_results |
20 | Maximum allowed search results |
Keyword routes and category aliases are also defined in config.py. Add new routes to the keyword_routes dict. Add nested path aliases to category_aliases (e.g., "security/redteam": "redteam" maps the nested directory to a flat category name).
knowledge-rag/
├── mcp_server/
│ ├── __init__.py # Version, exports
│ ├── config.py # Settings, keyword routes, category aliases
│ ├── ingestion.py # Document parsing, chunking
│ └── server.py # MCP tools, ChromaDB, BM25, search engine
├── documents/ # Document storage (by category subdirectory)
├── data/
│ ├── chroma_db/ # ChromaDB vector database
│ └── index_metadata.json # Index state cache
├── install.ps1 # Windows installer
├── requirements.txt # Python dependencies
├── CHANGELOG.md
├── LICENSE
└── README.md
Ollama not running. Start with ollama serve. Verify connectivity:
curl http://localhost:11434/api/tagsWrong Python version. Python 3.14 and later are not supported. Check the current version:
python3 --versionTo target a specific version when creating a venv:
python3.13 -m venv venvEmpty search results. Confirm documents exist in documents/. Rebuild the index:
reindex_documents(force=true)
MCP server not loading. Verify ~/.claude.json is valid JSON. Check that the command path points to the correct venv Python. Run claude mcp list to confirm the connection. On macOS and Linux, ensure venv/bin/python has execute permission.
ModuleNotFoundError. The MCP configuration must use the venv Python, not the system Python. Activate the venv and install dependencies:
source venv/bin/activate
pip install -r requirements.txtMIT License. See LICENSE.
Original author: Ailton Rocha (Lyon). Fork maintainer: Andrey Mishchenko.
Version 3.0.0.