A Retrieval-Augmented Generation (RAG) chatbot built in Python using FAISS for vector similarity search and Ollama for embeddings and LLM inference.
Retrieval-Augmented Generation (RAG) combines classical information retrieval with large language models. Instead of relying solely on the LLM’s internal knowledge, the system retrieves relevant chunks from an external corpus and injects them as context for generation.
This project includes:
- Dataset ingestion and text chunking
- Embedding generation via Ollama
- Vector indexing and similarity search using FAISS
- Persistent FAISS index with manifest-based validation
- Interactive terminal-based chatbot
- Unit tests for each module
- Continuous Integration (CI) workflow
- Pre-commit hooks
- FAISS — vector similarity search
- Ollama — embeddings and LLM inference
- UV — dependency and environment management
- Pytest — testing
- Ruff — linting and formatting
- Pre-commit — local enforcement of quality checks
git clone https://github.com/peterschenk01/rag-system.git
cd rag-systemuv syncInstall Ollama: https://ollama.com/download
Pull the models used by the system:
ollama pull hf.co/CompendiumLabs/bge-base-en-v1.5-gguf
ollama pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUFExample dataset (cat facts):
mkdir -p data
curl -L -o data/cat-facts.txt https://huggingface.co/ngxson/demo_simple_rag_py/resolve/main/cat-facts.txtuv run rag-systemuv sync --devuv run pre-commit installPre-commit runs formatting, linting, and other checks automatically before commits.
uv run ruff check .
uv run ruff format .uv run pytestA CI workflow is included to ensure:
- Tests pass
- Ruff linting succeeds
- Code quality matches local pre-commit checks
This project is licensed under the MIT License.