Skip to content

Commit 1239ef7

Browse files
committed
Context Aware Prompt Enhancer
1 parent 8dd4c9a commit 1239ef7

File tree

3 files changed

+459
-0
lines changed

3 files changed

+459
-0
lines changed

Makefile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,3 +271,17 @@ qdrant-prune:
271271

272272
qdrant-index-root:
273273
python3 scripts/mcp_router.py --run "reindex repo"
274+
275+
276+
# --- ctx CLI helper ---
277+
# Usage examples (default prints ONLY the improved prompt):
278+
# make ctx Q="how does hybrid search work?"
279+
# make ctx Q="explain caching" ARGS="--language python --under scripts/"
280+
# To include Supporting Context:
281+
# make ctx Q="explain caching" ARGS="--with-context --limit 2"
282+
ctx: ## enhance a prompt with repo context: make ctx Q="your question" [ARGS='--language python --under scripts/ --with-context']
283+
@if [ -z "$(Q)" ]; then \
284+
echo 'Usage: make ctx Q="your question" [ARGS="--language python --under scripts/ --with-context"]'; \
285+
exit 1; \
286+
fi; \
287+
python3 scripts/ctx.py "$(Q)" $(ARGS)

README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,47 @@ This re-enables the `llamacpp` container and resets `.env` to `http://llamacpp:8
186186
- llama-model / tokenizer: Fetch tiny GGUF model and tokenizer.json
187187
- qdrant-status / qdrant-list / qdrant-prune / qdrant-index-root: Convenience wrappers that route through the MCP bridge to inspect or maintain collections
188188

189+
190+
### CLI: ctx prompt enhancer
191+
192+
A thin CLI that retrieves code context and rewrites your input into a better, context-aware prompt using the local LLM decoder. By default it prints ONLY the improved prompt.
193+
194+
Examples:
195+
````bash
196+
# Default: print only the improved prompt (uses Docker llama.cpp on port 8080)
197+
scripts/ctx.py "Explain the caching logic to me in detail"
198+
199+
# Via Make target (default improved prompt only)
200+
make ctx Q="Explain the caching logic to me in detail"
201+
202+
# Filter by language/path or adjust tokens
203+
make ctx Q="Hybrid search details" ARGS="--language python --under scripts/ --limit 2 --rewrite-max-tokens 200"
204+
````
205+
206+
GPU Acceleration (Apple Silicon):
207+
For faster prompt rewriting, use the native Metal-accelerated decoder:
208+
````bash
209+
# 1. Set USE_GPU_DECODER=1 in your .env file (already set by default)
210+
# 2. Start the native llama.cpp server with Metal GPU
211+
scripts/gpu_toggle.sh start
212+
213+
# Now ctx.py will automatically use the GPU decoder on port 8081
214+
make ctx Q="Explain the caching logic to me in detail"
215+
216+
# Stop the native GPU server
217+
scripts/gpu_toggle.sh stop
218+
219+
# To use Docker decoder instead, set USE_GPU_DECODER=0 in .env and restart:
220+
docker compose up -d llamacpp
221+
````
222+
223+
Notes:
224+
- Defaults to the Indexer HTTP RMCP endpoint at http://localhost:8003/mcp (override with MCP_INDEXER_URL)
225+
- Decoder endpoint: automatically detects GPU mode via USE_GPU_DECODER env var (set by gpu_toggle.sh)
226+
- Docker decoder (default): http://localhost:8080/completion
227+
- GPU decoder (after gpu_toggle.sh gpu): http://localhost:8081/completion
228+
- See also: `make ctx`
229+
189230
## Index another codebase (outside this repo)
190231

191232
You can index any local folder by mounting it at /work. Three easy ways:

0 commit comments

Comments
 (0)