Retrieval-Augmented Generation (RAG) System

A Retrieval-Augmented Generation (RAG) chatbot built in Python using FAISS for vector similarity search and Ollama for embeddings and LLM inference.

Project Overview

Retrieval-Augmented Generation (RAG) combines classical information retrieval with large language models. Instead of relying solely on the LLM’s internal knowledge, the system retrieves relevant chunks from an external corpus and injects them as context for generation.

This project includes:

Dataset ingestion and text chunking
Embedding generation via Ollama
Vector indexing and similarity search using FAISS
Persistent FAISS index with manifest-based validation
Interactive terminal-based chatbot
Unit tests for each module
Continuous Integration (CI) workflow
Pre-commit hooks

Technology Stack

FAISS — vector similarity search
Ollama — embeddings and LLM inference
UV — dependency and environment management
Pytest — testing
Ruff — linting and formatting
Pre-commit — local enforcement of quality checks

Quickstart

1. Clone the repository

git clone https://github.com/peterschenk01/rag-system.git
cd rag-system

2. Install dependencies (UV)

uv sync

3. Ollama Setup

Install Ollama: https://ollama.com/download

Pull the models used by the system:

ollama pull hf.co/CompendiumLabs/bge-base-en-v1.5-gguf
ollama pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF

4. Dataset

Example dataset (cat facts):

mkdir -p data
curl -L -o data/cat-facts.txt https://huggingface.co/ngxson/demo_simple_rag_py/resolve/main/cat-facts.txt

5. Running the Chatbot

uv run rag-system

Development

Install development dependencies

uv sync --dev

Pre-commit hooks

uv run pre-commit install

Pre-commit runs formatting, linting, and other checks automatically before commits.

Linting & formatting (Ruff)

uv run ruff check .
uv run ruff format .

Running tests

uv run pytest

Continuous Integration

A CI workflow is included to ensure:

Tests pass
Ruff linting succeeds
Code quality matches local pre-commit checks

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
src/rag_system		src/rag_system
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval-Augmented Generation (RAG) System

Project Overview

Technology Stack

Quickstart

1. Clone the repository

2. Install dependencies (UV)

3. Ollama Setup

4. Dataset

5. Running the Chatbot

Development

Install development dependencies

Pre-commit hooks

Linting & formatting (Ruff)

Running tests

Continuous Integration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retrieval-Augmented Generation (RAG) System

Project Overview

Technology Stack

Quickstart

1. Clone the repository

2. Install dependencies (UV)

3. Ollama Setup

4. Dataset

5. Running the Chatbot

Development

Install development dependencies

Pre-commit hooks

Linting & formatting (Ruff)

Running tests

Continuous Integration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages