Overview
The chatbot currently injects the full documentation into the system prompt using INCLUDE_DOCS_CONTEXT. This works for testing, but it’s inefficient, slow, and wastes tokens.
Goal
Replace full-doc injection with a RAG setup to reduce tokens and improve response time.
Plan
- Parse all
.md files from user_docs
- Split content into logical chunks
- Generate embeddings per chunk
- Store them in a simple local vector index
- On each query:
- Run semantic search
- Fetch top 3–5 relevant chunks
- Inject only those into the prompt
Acceptance
INCLUDE_DOCS_CONTEXT is removed or rerouted to RAG
- Token usage drops noticeably
- Project-specific answers still work