Skip to content

hasnainyaqub/END-to-END-RAG-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 End-to-End RAG API

A production-ready, scalable Retrieval Augmented Generation (RAG) API built with FastAPI, LangChain, Groq, and FAISS.


📐 Architecture

User Query
    │
    ▼
┌─────────────────────────────────────────────────────────┐
│                     FastAPI App                         │
│  POST /api/v1/query                                     │
│  ┌──────────────────────────────────────────────────┐   │
│  │              RAGService (Service Layer)          │   │
│  │  ┌─────────────────────────────────────────┐    │   │
│  │  │           RAGPipeline                   │    │   │
│  │  │                                         │    │   │
│  │  │  ┌──────────────┐  ┌─────────────────┐ │    │   │
│  │  │  │ FAISSRetriever│  │   ChatGroq      │ │    │   │
│  │  │  │  (k=4 docs)  │  │  groq/compound  │ │    │   │
│  │  │  └──────┬───────┘  └────────┬────────┘ │    │   │
│  │  │         │                   │           │    │   │
│  │  │    Documents            Answer           │    │   │
│  │  └─────────────────────────────────────────┘    │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
    │
    ▼
QueryResponse { answer, sources, query }

Component Map

app/
├── main.py              ← FastAPI factory, middleware, exception handlers, lifespan
├── config.py            ← Pydantic v2 Settings (env-file based)
├── dependencies.py      ← FastAPI DI providers (Annotated types)
│
├── api/
│   ├── routes.py        ← POST /api/v1/query
│   └── health.py        ← GET /health
│
├── rag/
│   ├── pipeline.py      ← End-to-end orchestration (retrieve → format → prompt → generate)
│   ├── retriever.py     ← FAISSRetriever (sync + async)
│   ├── generator.py     ← ChatGroq singleton + async invocation
│   └── prompts.py       ← RAG prompt templates (strict grounding)
│
├── vectorstore/
│   └── vectordb.py      ← FAISS load/create with thread-safe singleton
│
├── core/
│   ├── logging.py       ← Structured production logging
│   └── exceptions.py    ← Domain exception hierarchy
│
├── schemas/
│   ├── request.py       ← QueryRequest (Pydantic v2)
│   └── response.py      ← QueryResponse, HealthResponse, ErrorResponse
│
└── services/
    └── rag_service.py   ← Business logic + schema translation

⚡ Tech Stack

Layer Technology
Web Framework FastAPI + Uvicorn
Data Validation Pydantic v2
LLM Groq groq/compound
Embeddings sentence-transformers/all-MiniLM-L6-v2
Vector Store FAISS (CPU)
Orchestration LangChain (modular packages)
Config pydantic-settings + .env
Containerisation Docker + docker-compose

🛠️ Setup & Installation

Prerequisites

1. Clone & enter the project

git clone <repo-url>
cd "End-to-End RAG System"

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate          # Linux/macOS
# .venv\Scripts\activate           # Windows

3. Install dependencies

pip install -r requirements.txt

4. Configure environment

cp .env.example .env
# Edit .env and set your GROQ_API_KEY
GROQ_API_KEY=gsk_your_actual_api_key_here

5. Build the FAISS index

# Index the built-in sample documents (FastAPI, LangChain, FAISS, Groq, RAG, etc.)
python ingest.py

# Or index your own .txt files:
python ingest.py --source /path/to/your/docs

6. Start the API

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The API will be available at:


🐳 Docker

Build & run with Docker

# Build the image
docker build -t rag-api .

# Run ingest to create the FAISS index
docker run --rm -v $(pwd)/faiss_index:/app/faiss_index --env-file .env rag-api python ingest.py

# Start the API
docker run -d \
  --name rag-api \
  -p 8000:8000 \
  -v $(pwd)/faiss_index:/app/faiss_index \
  --env-file .env \
  rag-api

Build & run with docker-compose

# 1. Build the FAISS index first
docker compose run --rm rag-api python ingest.py

# 2. Start the service
docker compose up -d

# Check logs
docker compose logs -f rag-api

🔌 API Reference

POST /api/v1/query

Query the RAG system with a natural-language question.

Request Body

{
  "query": "What is LangChain?"
}

Response (200 OK)

{
  "answer": "LangChain is a framework for developing applications powered by large language models (LLMs)...",
  "sources": [
    {
      "page_content": "LangChain is a framework for developing applications powered by...",
      "metadata": {
        "source": "langchain_docs.txt",
        "topic": "LangChain"
      }
    }
  ],
  "query": "What is LangChain?"
}

Error Responses

Status Condition
400 Empty or invalid query
500 Internal pipeline error
502 Groq LLM failure (e.g. invalid API key, rate limit)
503 FAISS index not found or vector store unavailable

GET /health

Liveness probe for orchestrators.

Response (200 OK)

{
  "status": "healthy",
  "app_name": "End-to-End RAG API",
  "version": "1.0.0"
}

📡 Example Requests

curl

# Health check
curl http://localhost:8000/health

# RAG query
curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is FAISS and how does it work?"}'

Python (httpx)

import httpx

response = httpx.post(
    "http://localhost:8000/api/v1/query",
    json={"query": "Explain RAG in simple terms."},
)
print(response.json())

⚙️ Configuration Reference

All settings are loaded from environment variables or .env:

Variable Default Description
GROQ_API_KEY Required Your Groq API key
LLM_MODEL groq/compound Groq model name
LLM_TEMPERATURE 0.0 LLM temperature (0 = deterministic)
LLM_MAX_TOKENS 1024 Maximum tokens in LLM response
EMBEDDING_MODEL sentence-transformers/all-MiniLM-L6-v2 HuggingFace embedding model
VECTOR_DB_PATH faiss_index Path to FAISS index directory
RETRIEVER_K 4 Number of documents to retrieve per query
API_PREFIX /api/v1 URL prefix for API routes
DEBUG false Enable debug logging
APP_NAME End-to-End RAG API Application name

🏗️ Scalability Design

Concern Solution
Cold start RAGService eagerly initialised at startup via lifespan
LLM reuse @lru_cache singleton for ChatGroq
Vector store reuse Thread-safe singleton with double-checked locking
Async I/O Async FastAPI handlers + ainvoke for Groq calls
Modularity Clean separation: api / rag / vectorstore / services / core
Error isolation Domain exception hierarchy → HTTP response mapping
Observability Structured logging with request-ID and timing headers
Config 12-factor app: all config via environment variables

🔒 Security

  • All secrets stored in .env only — never committed to VCS
  • .env listed in .gitignore
  • Docker image runs as non-root raguser
  • CORS configurable via CORS_ORIGINS env var
  • Request IDs in headers for tracing

📁 Project Structure

End-to-End RAG System/
├── app/
│   ├── __init__.py
│   ├── main.py              ← App factory + middleware + exception handlers
│   ├── config.py            ← Pydantic Settings
│   ├── dependencies.py      ← FastAPI DI wiring
│   │
│   ├── api/
│   │   ├── __init__.py
│   │   ├── routes.py        ← POST /api/v1/query
│   │   └── health.py        ← GET /health
│   │
│   ├── rag/
│   │   ├── __init__.py
│   │   ├── pipeline.py      ← Orchestration
│   │   ├── retriever.py     ← FAISS retrieval
│   │   ├── generator.py     ← Groq LLM
│   │   └── prompts.py       ← Prompt templates
│   │
│   ├── vectorstore/
│   │   ├── __init__.py
│   │   └── vectordb.py      ← FAISS management
│   │
│   ├── core/
│   │   ├── __init__.py
│   │   ├── logging.py       ← Structured logging
│   │   └── exceptions.py    ← Domain exceptions
│   │
│   ├── schemas/
│   │   ├── __init__.py
│   │   ├── request.py       ← QueryRequest
│   │   └── response.py      ← QueryResponse, HealthResponse
│   │
│   └── services/
│       ├── __init__.py
│       └── rag_service.py   ← Business logic
│
├── faiss_index/             ← Generated by ingest.py (gitignored)
├── ingest.py                ← Data ingestion script
├── .env                     ← Secrets (gitignored)
├── .env.example             ← Template
├── .gitignore
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── README.md

👨‍💻 Developer

Built by Hasnain Yaqoob - AI Engineer

About

A production-ready, scalable Retrieval Augmented Generation (RAG) API built with FastAPI, LangChain, Groq, and FAISS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors