🚀 End-to-End RAG API

A production-ready, scalable Retrieval Augmented Generation (RAG) API built with FastAPI, LangChain, Groq, and FAISS.

📐 Architecture

User Query
    │
    ▼
┌─────────────────────────────────────────────────────────┐
│                     FastAPI App                         │
│  POST /api/v1/query                                     │
│  ┌──────────────────────────────────────────────────┐   │
│  │              RAGService (Service Layer)          │   │
│  │  ┌─────────────────────────────────────────┐    │   │
│  │  │           RAGPipeline                   │    │   │
│  │  │                                         │    │   │
│  │  │  ┌──────────────┐  ┌─────────────────┐ │    │   │
│  │  │  │ FAISSRetriever│  │   ChatGroq      │ │    │   │
│  │  │  │  (k=4 docs)  │  │  groq/compound  │ │    │   │
│  │  │  └──────┬───────┘  └────────┬────────┘ │    │   │
│  │  │         │                   │           │    │   │
│  │  │    Documents            Answer           │    │   │
│  │  └─────────────────────────────────────────┘    │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
    │
    ▼
QueryResponse { answer, sources, query }

Component Map

app/
├── main.py              ← FastAPI factory, middleware, exception handlers, lifespan
├── config.py            ← Pydantic v2 Settings (env-file based)
├── dependencies.py      ← FastAPI DI providers (Annotated types)
│
├── api/
│   ├── routes.py        ← POST /api/v1/query
│   └── health.py        ← GET /health
│
├── rag/
│   ├── pipeline.py      ← End-to-end orchestration (retrieve → format → prompt → generate)
│   ├── retriever.py     ← FAISSRetriever (sync + async)
│   ├── generator.py     ← ChatGroq singleton + async invocation
│   └── prompts.py       ← RAG prompt templates (strict grounding)
│
├── vectorstore/
│   └── vectordb.py      ← FAISS load/create with thread-safe singleton
│
├── core/
│   ├── logging.py       ← Structured production logging
│   └── exceptions.py    ← Domain exception hierarchy
│
├── schemas/
│   ├── request.py       ← QueryRequest (Pydantic v2)
│   └── response.py      ← QueryResponse, HealthResponse, ErrorResponse
│
└── services/
    └── rag_service.py   ← Business logic + schema translation

⚡ Tech Stack

Layer	Technology
Web Framework	FastAPI + Uvicorn
Data Validation	Pydantic v2
LLM	Groq `groq/compound`
Embeddings	`sentence-transformers/all-MiniLM-L6-v2`
Vector Store	FAISS (CPU)
Orchestration	LangChain (modular packages)
Config	`pydantic-settings` + `.env`
Containerisation	Docker + docker-compose

🛠️ Setup & Installation

Prerequisites

Python 3.11+
A Groq API key (free tier available)

1. Clone & enter the project

git clone <repo-url>
cd "End-to-End RAG System"

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate          # Linux/macOS
# .venv\Scripts\activate           # Windows

3. Install dependencies

pip install -r requirements.txt

4. Configure environment

cp .env.example .env
# Edit .env and set your GROQ_API_KEY

GROQ_API_KEY=gsk_your_actual_api_key_here

5. Build the FAISS index

# Index the built-in sample documents (FastAPI, LangChain, FAISS, Groq, RAG, etc.)
python ingest.py

# Or index your own .txt files:
python ingest.py --source /path/to/your/docs

6. Start the API

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The API will be available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
Health Check: http://localhost:8000/health

🐳 Docker

Build & run with Docker

# Build the image
docker build -t rag-api .

# Run ingest to create the FAISS index
docker run --rm -v $(pwd)/faiss_index:/app/faiss_index --env-file .env rag-api python ingest.py

# Start the API
docker run -d \
  --name rag-api \
  -p 8000:8000 \
  -v $(pwd)/faiss_index:/app/faiss_index \
  --env-file .env \
  rag-api

Build & run with docker-compose

# 1. Build the FAISS index first
docker compose run --rm rag-api python ingest.py

# 2. Start the service
docker compose up -d

# Check logs
docker compose logs -f rag-api

🔌 API Reference

`POST /api/v1/query`

Query the RAG system with a natural-language question.

Request Body

{
  "query": "What is LangChain?"
}

Response (200 OK)

{
  "answer": "LangChain is a framework for developing applications powered by large language models (LLMs)...",
  "sources": [
    {
      "page_content": "LangChain is a framework for developing applications powered by...",
      "metadata": {
        "source": "langchain_docs.txt",
        "topic": "LangChain"
      }
    }
  ],
  "query": "What is LangChain?"
}

Error Responses

Status	Condition
400	Empty or invalid query
500	Internal pipeline error
502	Groq LLM failure (e.g. invalid API key, rate limit)
503	FAISS index not found or vector store unavailable

`GET /health`

Liveness probe for orchestrators.

Response (200 OK)

{
  "status": "healthy",
  "app_name": "End-to-End RAG API",
  "version": "1.0.0"
}

📡 Example Requests

curl

# Health check
curl http://localhost:8000/health

# RAG query
curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is FAISS and how does it work?"}'

Python (httpx)

import httpx

response = httpx.post(
    "http://localhost:8000/api/v1/query",
    json={"query": "Explain RAG in simple terms."},
)
print(response.json())

⚙️ Configuration Reference

All settings are loaded from environment variables or .env:

Variable	Default	Description
`GROQ_API_KEY`	Required	Your Groq API key
`LLM_MODEL`	`groq/compound`	Groq model name
`LLM_TEMPERATURE`	`0.0`	LLM temperature (0 = deterministic)
`LLM_MAX_TOKENS`	`1024`	Maximum tokens in LLM response
`EMBEDDING_MODEL`	`sentence-transformers/all-MiniLM-L6-v2`	HuggingFace embedding model
`VECTOR_DB_PATH`	`faiss_index`	Path to FAISS index directory
`RETRIEVER_K`	`4`	Number of documents to retrieve per query
`API_PREFIX`	`/api/v1`	URL prefix for API routes
`DEBUG`	`false`	Enable debug logging
`APP_NAME`	`End-to-End RAG API`	Application name

🏗️ Scalability Design

Concern	Solution
Cold start	RAGService eagerly initialised at startup via lifespan
LLM reuse	`@lru_cache` singleton for ChatGroq
Vector store reuse	Thread-safe singleton with double-checked locking
Async I/O	Async FastAPI handlers + `ainvoke` for Groq calls
Modularity	Clean separation: api / rag / vectorstore / services / core
Error isolation	Domain exception hierarchy → HTTP response mapping
Observability	Structured logging with request-ID and timing headers
Config	12-factor app: all config via environment variables

🔒 Security

All secrets stored in .env only — never committed to VCS
.env listed in .gitignore
Docker image runs as non-root raguser
CORS configurable via CORS_ORIGINS env var
Request IDs in headers for tracing

📁 Project Structure

End-to-End RAG System/
├── app/
│   ├── __init__.py
│   ├── main.py              ← App factory + middleware + exception handlers
│   ├── config.py            ← Pydantic Settings
│   ├── dependencies.py      ← FastAPI DI wiring
│   │
│   ├── api/
│   │   ├── __init__.py
│   │   ├── routes.py        ← POST /api/v1/query
│   │   └── health.py        ← GET /health
│   │
│   ├── rag/
│   │   ├── __init__.py
│   │   ├── pipeline.py      ← Orchestration
│   │   ├── retriever.py     ← FAISS retrieval
│   │   ├── generator.py     ← Groq LLM
│   │   └── prompts.py       ← Prompt templates
│   │
│   ├── vectorstore/
│   │   ├── __init__.py
│   │   └── vectordb.py      ← FAISS management
│   │
│   ├── core/
│   │   ├── __init__.py
│   │   ├── logging.py       ← Structured logging
│   │   └── exceptions.py    ← Domain exceptions
│   │
│   ├── schemas/
│   │   ├── __init__.py
│   │   ├── request.py       ← QueryRequest
│   │   └── response.py      ← QueryResponse, HealthResponse
│   │
│   └── services/
│       ├── __init__.py
│       └── rag_service.py   ← Business logic
│
├── faiss_index/             ← Generated by ingest.py (gitignored)
├── ingest.py                ← Data ingestion script
├── .env                     ← Secrets (gitignored)
├── .env.example             ← Template
├── .gitignore
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── README.md

👨‍💻 Developer

Built by Hasnain Yaqoob - AI Engineer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 End-to-End RAG API

📐 Architecture

Component Map

⚡ Tech Stack

🛠️ Setup & Installation

Prerequisites

1. Clone & enter the project

2. Create virtual environment

3. Install dependencies

4. Configure environment

5. Build the FAISS index

6. Start the API

🐳 Docker

Build & run with Docker

Build & run with docker-compose

🔌 API Reference

`POST /api/v1/query`

`GET /health`

📡 Example Requests

curl

Python (httpx)

⚙️ Configuration Reference

🏗️ Scalability Design

🔒 Security

📁 Project Structure

👨‍💻 Developer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
ingest.py		ingest.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🚀 End-to-End RAG API

📐 Architecture

Component Map

⚡ Tech Stack

🛠️ Setup & Installation

Prerequisites

1. Clone & enter the project

2. Create virtual environment

3. Install dependencies

4. Configure environment

5. Build the FAISS index

6. Start the API

🐳 Docker

Build & run with Docker

Build & run with docker-compose

🔌 API Reference

POST /api/v1/query

GET /health

📡 Example Requests

curl

Python (httpx)

⚙️ Configuration Reference

🏗️ Scalability Design

🔒 Security

📁 Project Structure

👨‍💻 Developer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/v1/query`

`GET /health`

Packages