Skip to content

Commit 936b8d8

Browse files
authored
Merge pull request #29 from m1rl0k/documentation
Documentation
2 parents 083523b + 0b8cbcd commit 936b8d8

File tree

12 files changed

+1155
-1422
lines changed

12 files changed

+1155
-1422
lines changed

README.md

Lines changed: 118 additions & 1418 deletions
Large diffs are not rendered by default.

deploy/kubernetes/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Kubernetes Deployment Guide
22

3+
**Documentation:** [README](../../README.md) · [Configuration](../../docs/CONFIGURATION.md) · [IDE Clients](../../docs/IDE_CLIENTS.md) · [MCP API](../../docs/MCP_API.md) · [ctx CLI](../../docs/CTX_CLI.md) · [Memory Guide](../../docs/MEMORY_GUIDE.md) · [Architecture](../../docs/ARCHITECTURE.md) · [Multi-Repo](../../docs/MULTI_REPO_COLLECTIONS.md) · Kubernetes · [VS Code Extension](../../docs/vscode-extension.md) · [Troubleshooting](../../docs/TROUBLESHOOTING.md) · [Development](../../docs/DEVELOPMENT.md)
4+
5+
---
6+
37
## Overview
48

59
This directory contains Kubernetes manifests for deploying Context Engine on a remote cluster using **Kustomize**. This enables:

docs/ARCHITECTURE.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
# Context Engine Architecture
22

3+
**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md)
4+
5+
---
6+
7+
**On this page:**
8+
- [Overview](#overview)
9+
- [Core Principles](#core-principles)
10+
- [System Architecture](#system-architecture)
11+
- [Data Flow](#data-flow)
12+
- [ReFRAG Pipeline](#refrag-pipeline)
13+
14+
---
15+
316
## Overview
417

518
Context Engine is a production-ready MCP (Model Context Protocol) retrieval stack that unifies code indexing, hybrid search, and optional LLM decoding. It enables teams to ship context-aware AI agents by providing sophisticated semantic and lexical search capabilities with dual-transport compatibility.

docs/CONFIGURATION.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# Configuration Reference
2+
3+
Complete environment variable reference for Context Engine.
4+
5+
**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md)
6+
7+
---
8+
9+
**On this page:**
10+
- [Core Settings](#core-settings)
11+
- [Indexing & Micro-Chunks](#indexing--micro-chunks)
12+
- [Watcher Settings](#watcher-settings)
13+
- [Reranker](#reranker)
14+
- [Decoder (llama.cpp / GLM)](#decoder-llamacpp--glm)
15+
- [ReFRAG](#refrag)
16+
- [Ports](#ports)
17+
- [Search & Expansion](#search--expansion)
18+
- [Memory Blending](#memory-blending)
19+
20+
---
21+
22+
## Core Settings
23+
24+
| Name | Description | Default |
25+
|------|-------------|---------|
26+
| COLLECTION_NAME | Qdrant collection name (unified across all repos) | codebase |
27+
| REPO_NAME | Logical repo tag stored in payload for filtering | auto-detect from git/folder |
28+
| HOST_INDEX_PATH | Host path mounted at /work in containers | current repo (.) |
29+
| QDRANT_URL | Qdrant base URL | container: http://qdrant:6333; local: http://localhost:6333 |
30+
31+
## Indexing & Micro-Chunks
32+
33+
| Name | Description | Default |
34+
|------|-------------|---------|
35+
| INDEX_MICRO_CHUNKS | Enable token-based micro-chunking | 0 (off) |
36+
| MAX_MICRO_CHUNKS_PER_FILE | Cap micro-chunks per file | 200 |
37+
| TOKENIZER_URL | HF tokenizer.json URL (for Make download) | n/a |
38+
| TOKENIZER_PATH | Local path where tokenizer is saved (Make) | models/tokenizer.json |
39+
| TOKENIZER_JSON | Runtime path for tokenizer (indexer) | models/tokenizer.json |
40+
| USE_TREE_SITTER | Enable tree-sitter parsing (py/js/ts) | 0 (off) |
41+
| INDEX_CHUNK_LINES | Lines per chunk (non-micro mode) | 120 |
42+
| INDEX_CHUNK_OVERLAP | Overlap lines between chunks | 20 |
43+
| INDEX_BATCH_SIZE | Upsert batch size | 64 |
44+
| INDEX_PROGRESS_EVERY | Log progress every N files | 200 |
45+
46+
## Watcher Settings
47+
48+
| Name | Description | Default |
49+
|------|-------------|---------|
50+
| WATCH_DEBOUNCE_SECS | Debounce between FS events | 1.5 |
51+
| INDEX_UPSERT_BATCH | Upsert batch size (watcher) | 128 |
52+
| INDEX_UPSERT_RETRIES | Retry count | 5 |
53+
| INDEX_UPSERT_BACKOFF | Seconds between retries | 0.5 |
54+
| QDRANT_TIMEOUT | HTTP timeout seconds | watcher: 60; search: 20 |
55+
| MCP_TOOL_TIMEOUT_SECS | Max duration for long-running MCP tools | 3600 |
56+
57+
## Reranker
58+
59+
| Name | Description | Default |
60+
|------|-------------|---------|
61+
| RERANKER_ONNX_PATH | Local ONNX cross-encoder model path | unset |
62+
| RERANKER_TOKENIZER_PATH | Tokenizer path for reranker | unset |
63+
| RERANKER_ENABLED | Enable reranker by default | 1 (enabled) |
64+
65+
## Decoder (llama.cpp / GLM)
66+
67+
| Name | Description | Default |
68+
|------|-------------|---------|
69+
| REFRAG_DECODER | Enable decoder for context_answer | 1 (enabled) |
70+
| REFRAG_RUNTIME | Decoder backend: llamacpp or glm | llamacpp |
71+
| LLAMACPP_URL | llama.cpp server endpoint | http://llamacpp:8080 or http://host.docker.internal:8081 |
72+
| LLAMACPP_TIMEOUT_SEC | Decoder request timeout | 300 |
73+
| DECODER_MAX_TOKENS | Max tokens for decoder responses | 4000 |
74+
| REFRAG_DECODER_MODE | prompt or soft (soft requires patched llama.cpp) | prompt |
75+
| GLM_API_KEY | API key for GLM provider | unset |
76+
| GLM_MODEL | GLM model name | glm-4.6 |
77+
| USE_GPU_DECODER | Native Metal decoder (1) vs Docker (0) | 0 (docker) |
78+
| LLAMACPP_GPU_LAYERS | Number of layers to offload to GPU, -1 for all | 32 |
79+
80+
## ReFRAG (Micro-Chunking & Retrieval)
81+
82+
| Name | Description | Default |
83+
|------|-------------|---------|
84+
| REFRAG_MODE | Enable micro-chunking and span budgeting | 1 (enabled) |
85+
| REFRAG_GATE_FIRST | Enable mini-vector gating | 1 (enabled) |
86+
| REFRAG_CANDIDATES | Candidates for gate-first filtering | 200 |
87+
| MICRO_BUDGET_TOKENS | Token budget for context_answer | 512 |
88+
| MICRO_OUT_MAX_SPANS | Max spans returned per query | 3 |
89+
| MICRO_CHUNK_TOKENS | Tokens per micro-chunk window | 16 |
90+
| MICRO_CHUNK_STRIDE | Stride between windows | 8 |
91+
| MICRO_MERGE_LINES | Lines to merge adjacent spans | 4 |
92+
| MICRO_TOKENS_PER_LINE | Estimated tokens per line | 32 |
93+
94+
## Ports
95+
96+
| Name | Description | Default |
97+
|------|-------------|---------|
98+
| FASTMCP_PORT | Memory MCP server port (SSE) | 8000 |
99+
| FASTMCP_INDEXER_PORT | Indexer MCP server port (SSE) | 8001 |
100+
| FASTMCP_HTTP_PORT | Memory RMCP host port mapping | 8002 |
101+
| FASTMCP_INDEXER_HTTP_PORT | Indexer RMCP host port mapping | 8003 |
102+
| FASTMCP_HEALTH_PORT | Health port (memory/indexer) | memory: 18000; indexer: 18001 |
103+
104+
## Search & Expansion
105+
106+
| Name | Description | Default |
107+
|------|-------------|---------|
108+
| HYBRID_EXPAND | Enable heuristic multi-query expansion | 0 (off) |
109+
| LLM_EXPAND_MAX | Max alternate queries via LLM | 0 |
110+
111+
## Memory Blending
112+
113+
| Name | Description | Default |
114+
|------|-------------|---------|
115+
| MEMORY_SSE_ENABLED | Enable SSE memory blending | false |
116+
| MEMORY_MCP_URL | Memory MCP endpoint for blending | http://mcp:8000/sse |
117+
| MEMORY_MCP_TIMEOUT | Timeout for memory queries | 6 |
118+
| MEMORY_AUTODETECT | Auto-detect memory collection | 1 |
119+
| MEMORY_COLLECTION_TTL_SECS | Cache TTL for collection detection | 300 |
120+
121+
---
122+
123+
## Exclusions (.qdrantignore)
124+
125+
The indexer supports a `.qdrantignore` file at the repo root (similar to `.gitignore`).
126+
127+
**Default exclusions** (overridable):
128+
- `/models`, `/node_modules`, `/dist`, `/build`
129+
- `/.venv`, `/venv`, `/__pycache__`, `/.git`
130+
- `*.onnx`, `*.bin`, `*.safetensors`, `tokenizer.json`, `*.whl`, `*.tar.gz`
131+
132+
**Override via env or flags:**
133+
```bash
134+
# Disable defaults
135+
QDRANT_DEFAULT_EXCLUDES=0
136+
137+
# Custom ignore file
138+
QDRANT_IGNORE_FILE=.myignore
139+
140+
# Additional excludes
141+
QDRANT_EXCLUDES='tokenizer.json,*.onnx,/third_party'
142+
```
143+
144+
**CLI examples:**
145+
```bash
146+
docker compose run --rm indexer --root /work --ignore-file .qdrantignore
147+
docker compose run --rm indexer --root /work --no-default-excludes --exclude '/vendor' --exclude '*.bin'
148+
```
149+
150+
---
151+
152+
## Scaling Recommendations
153+
154+
| Repo Size | Chunk Lines | Overlap | Batch Size |
155+
|-----------|------------|---------|------------|
156+
| Small (<100 files) | 80-120 | 16-24 | 32-64 |
157+
| Medium (100s-1k files) | 120-160 | ~20 | 64-128 |
158+
| Large (1k+ files) | 120 (default) | 20 | 128+ |
159+
160+
For large monorepos, set `INDEX_PROGRESS_EVERY=200` for visibility.
161+

docs/CTX_CLI.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# ctx.py - Prompt Enhancer CLI
2+
3+
A thin CLI that retrieves code context and rewrites your input into a better, context-aware prompt using the local LLM decoder. Works with both questions and commands/instructions.
4+
5+
**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md)
6+
7+
---
8+
9+
**On this page:**
10+
- [Basic Usage](#basic-usage)
11+
- [Detail Mode](#detail-mode)
12+
- [Unicorn Mode](#unicorn-mode)
13+
- [Advanced Features](#advanced-features)
14+
- [GPU Acceleration](#gpu-acceleration)
15+
- [Configuration](#configuration)
16+
17+
---
18+
19+
## Basic Usage
20+
21+
```bash
22+
# Questions: Enhanced with specific details and multiple aspects
23+
scripts/ctx.py "What is ReFRAG?"
24+
25+
# Commands: Enhanced with concrete targets and implementation details
26+
scripts/ctx.py "Refactor ctx.py"
27+
28+
# Via Make target
29+
make ctx Q="Explain the caching logic to me in detail"
30+
31+
# Filter by language/path or adjust tokens
32+
make ctx Q="Hybrid search details" ARGS="--language python --under scripts/ --limit 2 --rewrite-max-tokens 200"
33+
```
34+
35+
## Detail Mode
36+
37+
Include compact code snippets in the retrieved context for richer rewrites (trades speed for quality):
38+
39+
```bash
40+
# Enable detail mode (adds short snippets)
41+
scripts/ctx.py "Explain the caching logic" --detail
42+
43+
# Detail mode with commands
44+
scripts/ctx.py "Add error handling to ctx.py" --detail
45+
46+
# Adjust snippet size (default is 1 line when --detail is used)
47+
make ctx Q="Explain hybrid search" ARGS="--detail --context-lines 2"
48+
```
49+
50+
**Notes:**
51+
- Default behavior is header-only (fastest). `--detail` adds short snippets.
52+
- Detail mode is optimized for speed: automatically clamps to max 4 results and 1 result per file.
53+
54+
## Unicorn Mode
55+
56+
Use `--unicorn` for the highest quality prompt enhancement with a staged 2-3 pass approach:
57+
58+
```bash
59+
# Unicorn mode with commands
60+
scripts/ctx.py "refactor ctx.py" --unicorn
61+
62+
# Unicorn mode with questions
63+
scripts/ctx.py "what is ReFRAG and how does it work?" --unicorn
64+
65+
# Works with all filters
66+
scripts/ctx.py "add error handling" --unicorn --language python
67+
```
68+
69+
**How it works:**
70+
71+
1. **Pass 1 (Draft)**: Retrieves rich code snippets (8 lines of context) to understand the codebase
72+
2. **Pass 2 (Refine)**: Retrieves even richer snippets (12 lines) to ground the prompt with concrete code
73+
3. **Pass 3 (Polish)**: Optional cleanup pass if output appears generic or incomplete
74+
75+
**Key features:**
76+
- **Code-grounded**: References actual code behaviors and patterns
77+
- **No hallucinations**: Only uses real code from your indexed repository
78+
- **Multi-paragraph output**: Produces detailed, comprehensive prompts
79+
- **Works with both questions and commands**
80+
81+
**When to use:**
82+
- **Normal mode**: Quick, everyday prompts (fastest)
83+
- **--detail**: Richer context without multi-pass overhead (balanced)
84+
- **--unicorn**: When you need the absolute best prompt quality
85+
86+
## Advanced Features
87+
88+
### Streaming Output (Default)
89+
90+
All modes stream tokens as they arrive for instant feedback:
91+
92+
```bash
93+
scripts/ctx.py "refactor ctx.py" --unicorn
94+
```
95+
96+
To disable streaming, set `"streaming": false` in `~/.ctx_config.json`
97+
98+
### Memory Blending
99+
100+
Automatically falls back to `context_search` with memories when repo search returns no hits:
101+
102+
```bash
103+
# If no code matches, ctx.py will search design docs and ADRs
104+
scripts/ctx.py "What is our authentication strategy?"
105+
```
106+
107+
### Adaptive Context Sizing
108+
109+
Automatically adjusts `limit` and `context_lines` based on query characteristics:
110+
- **Short/vague queries** → More context for richer grounding
111+
- **Queries with file/function names** → Lighter settings for speed
112+
113+
### Automatic Quality Assurance
114+
115+
Enhanced `_needs_polish()` heuristic triggers a third polish pass when:
116+
- Output is too short (< 180 chars)
117+
- Contains generic/vague language
118+
- Missing concrete code references
119+
- Lacks proper paragraph structure
120+
121+
### Personalized Templates
122+
123+
Create `~/.ctx_config.json` to customize behavior:
124+
125+
```json
126+
{
127+
"always_include_tests": true,
128+
"prefer_bullet_commands": false,
129+
"extra_instructions": "Always consider error handling and edge cases",
130+
"streaming": true
131+
}
132+
```
133+
134+
**Available preferences:**
135+
- `always_include_tests`: Add testing considerations to all prompts
136+
- `prefer_bullet_commands`: Format commands as bullet points
137+
- `extra_instructions`: Custom instructions added to every rewrite
138+
- `streaming`: Enable/disable streaming output (default: true)
139+
140+
See `ctx_config.example.json` for a template.
141+
142+
## GPU Acceleration
143+
144+
For faster prompt rewriting, use the native Metal-accelerated decoder:
145+
146+
```bash
147+
# Start the native llama.cpp server with Metal GPU
148+
scripts/gpu_toggle.sh start
149+
150+
# Now ctx.py will automatically use the GPU decoder on port 8081
151+
make ctx Q="Explain the caching logic"
152+
153+
# Stop the native GPU server
154+
scripts/gpu_toggle.sh stop
155+
```
156+
157+
## Configuration
158+
159+
| Setting | Description | Default |
160+
|---------|-------------|---------|
161+
| MCP_INDEXER_URL | Indexer HTTP RMCP endpoint | http://localhost:8003/mcp |
162+
| USE_GPU_DECODER | Auto-detect GPU mode | 0 |
163+
| LLAMACPP_URL | Docker decoder endpoint | http://localhost:8080 |
164+
165+
GPU decoder (after `gpu_toggle.sh gpu`): http://localhost:8081/completion
166+

docs/DEVELOPMENT.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,19 @@
22

33
This guide covers setting up a development environment, understanding the codebase structure, and contributing to Context Engine.
44

5+
**Documentation:** [README](../README.md) · [Configuration](CONFIGURATION.md) · [IDE Clients](IDE_CLIENTS.md) · [MCP API](MCP_API.md) · [ctx CLI](CTX_CLI.md) · [Memory Guide](MEMORY_GUIDE.md) · [Architecture](ARCHITECTURE.md) · [Multi-Repo](MULTI_REPO_COLLECTIONS.md) · [Kubernetes](../deploy/kubernetes/README.md) · [VS Code Extension](vscode-extension.md) · [Troubleshooting](TROUBLESHOOTING.md) · [Development](DEVELOPMENT.md)
6+
7+
---
8+
9+
**On this page:**
10+
- [Prerequisites](#prerequisites)
11+
- [Quick Start](#quick-start)
12+
- [Project Structure](#project-structure)
13+
- [Testing](#testing)
14+
- [Docker Development](#docker-development)
15+
16+
---
17+
518
## Prerequisites
619

720
### Required Software

0 commit comments

Comments
 (0)