A hands-on sandbox to explore Retrieval-Augmented Generation (RAG): ingest, index, inspect, visualize, and query your documents.
Educational only These scripts favor clarity over resiliency, security, and edge-cases. I don't intend for them to be ready for production-quality deployments.
RAG systems can feel opaque. This repo breaks the pipeline into small, runnable scripts so you can see exactly what happens at each step:
- Ingest documents → split → embed → store in ChromaDB
- Inspect what got stored (documents, metadatas, vectors)
- Visualize embeddings with UMAP to build intuition
- Query the store and view retrieved context for LLM prompting
Is:
- A learn-by-doing toolkit to demystify RAG mechanics
- Small, focused Python scripts with clear flags and outputs
- A starting point for your own experiments
Isn’t:
- Production ready (no auth, no robust error handling, minimal tests)
- Optimized for performance or cost
- A full RAG framework or service
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pippip install -r requirements.txt
hf download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf --local-dir .
hf download nomic-ai/nomic-embed-text-v1.5-GGUF nomic-embed-text-v1.5.f16.gguf --local-dir .
wget -O the-us-constitution.pdf https://www.lexisnexis.com/supp/lawschool/resources/the-us-constitution.pdf
wget -O comprehensive-rust.pdf https://google.github.io/comprehensive-rust/comprehensive-rust.pdf
wget -O honda_2020_fit_manual.pdf https://techinfo.honda.com/rjanisis/pubs/OM/AH/AT5A2020OM/enu/AT5A2020OM.PDFingest_chroma.py
Splits input docs, computes embeddings, and upserts into a persistent Chroma collection.
python ingest_chroma.py --rebuild --db ./database/chroma_db1 --embed-model nomic-embed-text-v1.5.f16.ggufinspect_chroma.py
View collections, counts, and sample items (without pulling giant vectors).
python inspect_chroma.py --db ./database/chroma_db1 --limit 9999 --include-embeddings*visualize_umap.py
Projects high-dimensional vectors to 2D with UMAP and saves a PNG. You can color points by a metadata key (e.g., source file).
python visualize_umap.py --db ./database/chroma_db1 --out plot.png --label-key sourcequery_rag.py
Retrieves top-K chunks and then asks an LLM to draft an answer.
python inference_rag.py --db ./database/chroma_db1 --llm-model ./Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf --embed-model ./nomic-embed-text-v1.5.f16.gguf --threads 24 --ctx 2048 --max-tokens 1024 --show-retrieved --use-reranker --reranker-top-n 4 --gpu-layers 99