RAG Starter

Quick Start

Clone the repo, install dependencies, and run:

pip install -r requirements.txt
python3 -m uvicorn main:app --reload --port 8002

Then open:

http://127.0.0.1:8002/docs

A minimal, production-style Retrieval-Augmented Generation (RAG) backend.

This project demonstrates how to build a complete RAG pipeline from scratch using:

document chunking
embeddings
vector similarity search
LLM-based answer generation

All running locally with a simple API.

Overview

RAG (Retrieval-Augmented Generation) allows AI systems to answer questions using external knowledge instead of relying only on model training.

This repository provides a clean reference implementation of:

Ingesting documents
Converting them into embeddings
Storing them persistently
Retrieving relevant context
Generating answers using that context

Architecture

User Query ↓ Embed Query ↓ Retrieve Relevant Chunks (cosine similarity) ↓ Inject Context into LLM ↓ Generated Answer

Features

FastAPI backend
Chunking with overlap
OpenAI embeddings
Cosine similarity retrieval
SQLite-based persistent storage
Clean modular structure

Project Structure

main.py      → API routes
rag.py       → RAG pipeline (embedding, retrieval, generation)
storage.py   → SQLite storage layer
models.py    → request/response schemas

Requirements

Python 3.9+
OpenAI API key

Setup

1. Create virtual environment

python3 -m venv venv
source venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. Create `.env`

OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4.1-mini
EMBEDDING_MODEL=text-embedding-3-small
REQUEST_TIMEOUT=20

Run

python3 -m uvicorn main:app --reload --port 8002

Open:

http://127.0.0.1:8002/docs

Usage

1. Add a document

POST /documents/add

Example:

{
  "document_id": "doc1",
  "text": "My favorite food is sushi. I also like pizza."
}

2. Ask a question

POST /chat/ask

Example:

{
  "query": "What do I like to eat?"
}

How It Works

Documents are split into chunks
Each chunk is converted into an embedding
Chunks are stored in SQLite
Queries are embedded
Similar chunks are retrieved using cosine similarity
Retrieved context is sent to the LLM
The model generates a grounded response

Notes

Data is stored locally in rag.db
Restarting the server does not erase stored data
This is a starter implementation and not optimized for scale

Future Improvements

Vector database integration
Metadata filtering
Batch embedding
Streaming responses
File upload support (PDF, text)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
embeddings.py		embeddings.py
main.py		main.py
models.py		models.py
rag.py		rag.py
requirements.txt		requirements.txt
storage.py		storage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Starter

Quick Start

Overview

Architecture

Features

Project Structure

Requirements

Setup

1. Create virtual environment

2. Install dependencies

3. Create `.env`

Run

Usage

1. Add a document

2. Ask a question

How It Works

Notes

Future Improvements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Starter

Quick Start

Overview

Architecture

Features

Project Structure

Requirements

Setup

1. Create virtual environment

2. Install dependencies

3. Create .env

Run

Usage

1. Add a document

2. Ask a question

How It Works

Notes

Future Improvements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

3. Create `.env`

Packages