Skip to content

adamarchuleta/RAG-STARTER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Starter

Quick Start

Clone the repo, install dependencies, and run:

pip install -r requirements.txt
python3 -m uvicorn main:app --reload --port 8002

Then open:

http://127.0.0.1:8002/docs

A minimal, production-style Retrieval-Augmented Generation (RAG) backend.

This project demonstrates how to build a complete RAG pipeline from scratch using:

  • document chunking
  • embeddings
  • vector similarity search
  • LLM-based answer generation

All running locally with a simple API.


Overview

RAG (Retrieval-Augmented Generation) allows AI systems to answer questions using external knowledge instead of relying only on model training.

This repository provides a clean reference implementation of:

  1. Ingesting documents
  2. Converting them into embeddings
  3. Storing them persistently
  4. Retrieving relevant context
  5. Generating answers using that context

Architecture

User Query ↓ Embed Query ↓ Retrieve Relevant Chunks (cosine similarity) ↓ Inject Context into LLM ↓ Generated Answer


Features

  • FastAPI backend
  • Chunking with overlap
  • OpenAI embeddings
  • Cosine similarity retrieval
  • SQLite-based persistent storage
  • Clean modular structure

Project Structure

main.py      → API routes
rag.py       → RAG pipeline (embedding, retrieval, generation)
storage.py   → SQLite storage layer
models.py    → request/response schemas

Requirements

  • Python 3.9+
  • OpenAI API key

Setup

1. Create virtual environment

python3 -m venv venv
source venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. Create .env

OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4.1-mini
EMBEDDING_MODEL=text-embedding-3-small
REQUEST_TIMEOUT=20

Run

python3 -m uvicorn main:app --reload --port 8002

Open:

http://127.0.0.1:8002/docs

Usage

1. Add a document

POST /documents/add

Example:

{
  "document_id": "doc1",
  "text": "My favorite food is sushi. I also like pizza."
}

2. Ask a question

POST /chat/ask

Example:

{
  "query": "What do I like to eat?"
}

How It Works

  1. Documents are split into chunks
  2. Each chunk is converted into an embedding
  3. Chunks are stored in SQLite
  4. Queries are embedded
  5. Similar chunks are retrieved using cosine similarity
  6. Retrieved context is sent to the LLM
  7. The model generates a grounded response

Notes

  • Data is stored locally in rag.db
  • Restarting the server does not erase stored data
  • This is a starter implementation and not optimized for scale

Future Improvements

  • Vector database integration
  • Metadata filtering
  • Batch embedding
  • Streaming responses
  • File upload support (PDF, text)

License

MIT

About

A clean, minimal RAG backend that demonstrates document ingestion, embedding, retrieval, and LLM-based answering with persistent local storage.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages