This doc explains the biggest performance levers in ctxeng, what they cost, and how to keep runs fast on large repositories.
Most runtime is spent in one of these stages:
- Discovery: walking the filesystem + reading files
- Scoring: keyword/path/AST/git signals per file
- (Optional) semantic scoring: embeddings compute + cache IO
- (Optional) RAG: chunking + retrieval over top-ranked candidates
- Budgeting: token counting + truncation decisions
The single best optimization is shrinking the candidate set early.
- Use
.ctxengignoreto exclude heavy directories:node_modules/,dist/,build/,coverage/,venv/,.venv/, logs, generated assets
- Prefer
--git-difffor PR reviews (small candidate set). - Disable slow signals if you don’t need them:
--no-gitif git commands are slow/unavailable--no-import-graphif import expansion isn’t useful for your task
- Use
--ragfor big repos when whole-file context is too large or too noisy.
ctxeng build "Review these changes. Focus on security and correctness." --git-diff --fmt markdown --output ctx.mdWhy it’s fast: discovery is limited to the diff.
ctxeng build "Explain the authentication flow end-to-end" --rag --trace --fmt markdown --output ctx.mdWhy it works: rank files first, then retrieve only the best chunks.
ctxeng build "Give me a high-level architecture overview" --skeleton --fmt markdown --output ctx.mdWhy it works: skeleton mode reduces token cost dramatically for Python.
For large repositories, ctxeng can parallelize file scoring to improve throughput (scoring is CPU + IO bound).
Tips:
- Parallelism helps most when:
- you have many medium-sized text files
- your disk is reasonably fast (SSD)
- Parallelism helps least when:
- the run is dominated by a few huge files (excluded by default size guard)
- the repo is on a slow network filesystem
Semantic scoring uses local embeddings and is optional:
pip install "ctxeng[semantic]"Notes:
- Default semantic model is
all-mpnet-base-v2(good quality, heavier than mini models). - Embeddings are cached under
.ctxeng_cache/keyed by(content hash + model name). - On very large repos, semantic scoring can dominate runtime; consider:
- using
--git-diff - enabling
--ragso only top candidates are embedded/chunked - disabling semantic for quick iterations
- using
RAG performance is mostly controlled by:
- candidate set size: internally bounded (top-ranked files)
- chunk size:
--rag-chunk-max-lines - overlap:
--rag-chunk-overlap - max chunks output:
--rag-max-chunks
Rules of thumb:
- If output is too big/noisy: reduce
--rag-max-chunks - If retrieval misses key details: increase
--rag-max-chunksor reduce--rag-chunk-max-lines - If chunks lack local context: increase
--rag-chunk-context-lines
Token counting uses tiktoken when installed (recommended):
pip install "ctxeng[tiktoken]"Without it, ctxeng uses a fast heuristic; it’s cheaper but less accurate.
- Use
--traceto see stage-level decisions and counts. - If a run feels slow, first answer:
- How many files were discovered?
- Is semantic scoring enabled?
- Is git recency enabled?
- Is RAG enabled?
Target: < 5 seconds for 10K files on a modern laptop (varies with disk + git + semantic).
Suggested benchmark scenarios:
- PR review:
--git-diff - Large repo explain:
--rag - Overview:
--skeleton
Record:
- repo size (files scanned)
- flags used
- wall-clock time
- output size (included files/chunks, tokens)