Skip to content

Local RAG MCP server for Claude Code. Hybrid search over a personal document knowledge base.

License

Notifications You must be signed in to change notification settings

mvandrew/knowledge-rag

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge RAG

Local RAG MCP server for Claude Code. Hybrid search (semantic + BM25) over a personal document knowledge base using ChromaDB and Ollama embeddings.

Prerequisites

  • Python 3.11, 3.12, or 3.13. Python 3.14 and later are not supported due to unresolved ChromaDB compatibility issues.
  • Ollama with the nomic-embed-text model.
  • Claude Code.

Installation

macOS

brew install python@3.13 ollama
ollama serve &
ollama pull nomic-embed-text
git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Linux (Debian/Ubuntu)

sudo apt install python3.13 python3.13-venv
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull nomic-embed-text
git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Windows

Use WSL 2 and follow the Linux instructions above.

For native Windows, run install.ps1:

git clone https://github.com/mvandrew/knowledge-rag.git
cd knowledge-rag
.\install.ps1

MCP Configuration

Add the server to ~/.claude.json. Replace /path/to/knowledge-rag with the actual path.

macOS / Linux:

{
  "mcpServers": {
    "knowledge-rag": {
      "type": "stdio",
      "command": "/path/to/knowledge-rag/venv/bin/python3",
      "args": ["-m", "mcp_server.server"],
      "env": {
        "PYTHONUNBUFFERED": "1",
        "PYTHONPATH": "/path/to/knowledge-rag",
        "ANONYMIZED_TELEMETRY": "False"
      }
    }
  }
}

Windows (native):

{
  "mcpServers": {
    "knowledge-rag": {
      "type": "stdio",
      "command": "cmd",
      "args": ["/c", "cd /d C:\\path\\to\\knowledge-rag && .\\venv\\Scripts\\python.exe -m mcp_server.server"],
      "env": {
        "PYTHONUNBUFFERED": "1",
        "ANONYMIZED_TELEMETRY": "False"
      }
    }
  }
}

Environment variables:

Variable Purpose
PYTHONUNBUFFERED Disables stdout buffering. Required -- without it, JSON-RPC messages may not flush in time.
PYTHONPATH Module search path. Required on macOS/Linux when using direct venv Python without cd.
ANONYMIZED_TELEMETRY Disables ChromaDB telemetry. Optional.

Restart Claude Code after editing the configuration.

Usage

Place documents in documents/, organized by category subdirectories. Each subdirectory name becomes a category. New categories are created automatically.

documents/
├── laravel/
│   └── eloquent-tips.md
├── docker/
│   └── compose-patterns.md
├── security/
│   ├── redteam/
│   └── blueteam/
└── general/
    └── notes.txt

Supported formats: .md, .txt, .pdf, .py, .json.

Documents are indexed automatically on server startup when the index is empty. Use reindex_documents to rebuild.

MCP Tools

Tool Description
search_knowledge Hybrid semantic + BM25 search
get_document Retrieve full document content
save_document Save a new document and index it
reindex_documents Rebuild the search index
list_categories List categories with document counts
list_documents List indexed documents
get_index_stats Index statistics

search_knowledge

Parameter Type Default Description
query string required Search query text
max_results int 5 Maximum results to return (1--20)
category string null Category filter
hybrid_alpha float 0.5 Search balance: 0.0 = keyword only, 1.0 = semantic only

get_document

Parameter Type Description
filepath string Path to the document file

save_document

Parameter Type Default Description
title string required Document title (used as filename)
content string required Document content in markdown format
category string "general" Category subdirectory; new categories are auto-created

reindex_documents

Parameter Type Default Description
force bool false Clear existing index and rebuild from scratch

list_categories

No parameters.

list_documents

Parameter Type Description
category string Optional category filter

get_index_stats

No parameters.

Search Tuning

hybrid_alpha Behavior Use case
0.0 Pure BM25 keyword search Exact terms, CVE IDs, tool names
0.3 Keyword-heavy hybrid Technical queries with specific terms
0.5 Balanced (default) General queries
0.7 Semantic-heavy hybrid Conceptual queries, related topics
1.0 Pure semantic search "How to..." questions, understanding intent

Keyword routing runs before search. When query terms match configured keyword routes (word-boundary regex matching), results are filtered to the matching category. When multiple keywords match different categories, each category is scored by match count and the highest-scoring category wins.

How It Works

The search pipeline has four stages. First, keyword routing checks the query against configured routes using word-boundary regex. If a route matches, search is scoped to that category. Single-word routes use \b boundaries to prevent false positives (e.g., "api" does not match "RAPID"). Multi-word phrases use exact substring matching.

Second, ChromaDB performs vector similarity search using Ollama nomic-embed-text embeddings (768 dimensions). Third, the BM25 index performs exact term matching via the rank-bm25 library. The BM25 index is built lazily from ChromaDB data on the first query.

Fourth, Reciprocal Rank Fusion (RRF) with k=60 combines both rankings. Each result receives a weighted score: hybrid_alpha * 1/(k + semantic_rank) + (1 - hybrid_alpha) * 1/(k + bm25_rank). Results found by both methods are marked "hybrid" in output. Results from only one method are marked "semantic" or "keyword".

Documents are chunked at 1000 characters with 200-character overlap, breaking at paragraph, sentence, or word boundaries. Embeddings are generated in parallel using a ThreadPoolExecutor with 4 workers.

Configuration

Key settings in mcp_server/config.py:

Setting Default Description
chunk_size 1000 Characters per chunk
chunk_overlap 200 Overlap between consecutive chunks
ollama_model nomic-embed-text Ollama embedding model name
ollama_base_url http://localhost:11434 Ollama API endpoint
collection_name knowledge_base ChromaDB collection name
default_results 5 Default search result count
max_results 20 Maximum allowed search results

Keyword routes and category aliases are also defined in config.py. Add new routes to the keyword_routes dict. Add nested path aliases to category_aliases (e.g., "security/redteam": "redteam" maps the nested directory to a flat category name).

Project Structure

knowledge-rag/
├── mcp_server/
│   ├── __init__.py          # Version, exports
│   ├── config.py            # Settings, keyword routes, category aliases
│   ├── ingestion.py         # Document parsing, chunking
│   └── server.py            # MCP tools, ChromaDB, BM25, search engine
├── documents/               # Document storage (by category subdirectory)
├── data/
│   ├── chroma_db/           # ChromaDB vector database
│   └── index_metadata.json  # Index state cache
├── install.ps1              # Windows installer
├── requirements.txt         # Python dependencies
├── CHANGELOG.md
├── LICENSE
└── README.md

Troubleshooting

Ollama not running. Start with ollama serve. Verify connectivity:

curl http://localhost:11434/api/tags

Wrong Python version. Python 3.14 and later are not supported. Check the current version:

python3 --version

To target a specific version when creating a venv:

python3.13 -m venv venv

Empty search results. Confirm documents exist in documents/. Rebuild the index:

reindex_documents(force=true)

MCP server not loading. Verify ~/.claude.json is valid JSON. Check that the command path points to the correct venv Python. Run claude mcp list to confirm the connection. On macOS and Linux, ensure venv/bin/python has execute permission.

ModuleNotFoundError. The MCP configuration must use the venv Python, not the system Python. Activate the venv and install dependencies:

source venv/bin/activate
pip install -r requirements.txt

License

MIT License. See LICENSE.

Authors

Original author: Ailton Rocha (Lyon). Fork maintainer: Andrey Mishchenko.

Version 3.0.0.

About

Local RAG MCP server for Claude Code. Hybrid search over a personal document knowledge base.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 65.6%
  • PowerShell 34.4%