Clone the repo, install dependencies, and run:
pip install -r requirements.txt
python3 -m uvicorn main:app --reload --port 8002
Then open:
http://127.0.0.1:8002/docs
A minimal, production-style Retrieval-Augmented Generation (RAG) backend.
This project demonstrates how to build a complete RAG pipeline from scratch using:
- document chunking
- embeddings
- vector similarity search
- LLM-based answer generation
All running locally with a simple API.
RAG (Retrieval-Augmented Generation) allows AI systems to answer questions using external knowledge instead of relying only on model training.
This repository provides a clean reference implementation of:
- Ingesting documents
- Converting them into embeddings
- Storing them persistently
- Retrieving relevant context
- Generating answers using that context
User Query ↓ Embed Query ↓ Retrieve Relevant Chunks (cosine similarity) ↓ Inject Context into LLM ↓ Generated Answer
- FastAPI backend
- Chunking with overlap
- OpenAI embeddings
- Cosine similarity retrieval
- SQLite-based persistent storage
- Clean modular structure
main.py → API routes
rag.py → RAG pipeline (embedding, retrieval, generation)
storage.py → SQLite storage layer
models.py → request/response schemas
- Python 3.9+
- OpenAI API key
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4.1-mini
EMBEDDING_MODEL=text-embedding-3-small
REQUEST_TIMEOUT=20
python3 -m uvicorn main:app --reload --port 8002
Open:
http://127.0.0.1:8002/docs
POST /documents/add
Example:
{
"document_id": "doc1",
"text": "My favorite food is sushi. I also like pizza."
}
POST /chat/ask
Example:
{
"query": "What do I like to eat?"
}
- Documents are split into chunks
- Each chunk is converted into an embedding
- Chunks are stored in SQLite
- Queries are embedded
- Similar chunks are retrieved using cosine similarity
- Retrieved context is sent to the LLM
- The model generates a grounded response
- Data is stored locally in
rag.db - Restarting the server does not erase stored data
- This is a starter implementation and not optimized for scale
- Vector database integration
- Metadata filtering
- Batch embedding
- Streaming responses
- File upload support (PDF, text)
MIT