A production-ready, scalable Retrieval Augmented Generation (RAG) API built with FastAPI, LangChain, Groq, and FAISS.
User Query
│
▼
┌─────────────────────────────────────────────────────────┐
│ FastAPI App │
│ POST /api/v1/query │
│ ┌──────────────────────────────────────────────────┐ │
│ │ RAGService (Service Layer) │ │
│ │ ┌─────────────────────────────────────────┐ │ │
│ │ │ RAGPipeline │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────┐ ┌─────────────────┐ │ │ │
│ │ │ │ FAISSRetriever│ │ ChatGroq │ │ │ │
│ │ │ │ (k=4 docs) │ │ groq/compound │ │ │ │
│ │ │ └──────┬───────┘ └────────┬────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ Documents Answer │ │ │
│ │ └─────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
QueryResponse { answer, sources, query }
app/
├── main.py ← FastAPI factory, middleware, exception handlers, lifespan
├── config.py ← Pydantic v2 Settings (env-file based)
├── dependencies.py ← FastAPI DI providers (Annotated types)
│
├── api/
│ ├── routes.py ← POST /api/v1/query
│ └── health.py ← GET /health
│
├── rag/
│ ├── pipeline.py ← End-to-end orchestration (retrieve → format → prompt → generate)
│ ├── retriever.py ← FAISSRetriever (sync + async)
│ ├── generator.py ← ChatGroq singleton + async invocation
│ └── prompts.py ← RAG prompt templates (strict grounding)
│
├── vectorstore/
│ └── vectordb.py ← FAISS load/create with thread-safe singleton
│
├── core/
│ ├── logging.py ← Structured production logging
│ └── exceptions.py ← Domain exception hierarchy
│
├── schemas/
│ ├── request.py ← QueryRequest (Pydantic v2)
│ └── response.py ← QueryResponse, HealthResponse, ErrorResponse
│
└── services/
└── rag_service.py ← Business logic + schema translation
| Layer | Technology |
|---|---|
| Web Framework | FastAPI + Uvicorn |
| Data Validation | Pydantic v2 |
| LLM | Groq groq/compound |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| Vector Store | FAISS (CPU) |
| Orchestration | LangChain (modular packages) |
| Config | pydantic-settings + .env |
| Containerisation | Docker + docker-compose |
- Python 3.11+
- A Groq API key (free tier available)
git clone <repo-url>
cd "End-to-End RAG System"python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windowspip install -r requirements.txtcp .env.example .env
# Edit .env and set your GROQ_API_KEYGROQ_API_KEY=gsk_your_actual_api_key_here# Index the built-in sample documents (FastAPI, LangChain, FAISS, Groq, RAG, etc.)
python ingest.py
# Or index your own .txt files:
python ingest.py --source /path/to/your/docsuvicorn app.main:app --reload --host 0.0.0.0 --port 8000The API will be available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Health Check: http://localhost:8000/health
# Build the image
docker build -t rag-api .
# Run ingest to create the FAISS index
docker run --rm -v $(pwd)/faiss_index:/app/faiss_index --env-file .env rag-api python ingest.py
# Start the API
docker run -d \
--name rag-api \
-p 8000:8000 \
-v $(pwd)/faiss_index:/app/faiss_index \
--env-file .env \
rag-api# 1. Build the FAISS index first
docker compose run --rm rag-api python ingest.py
# 2. Start the service
docker compose up -d
# Check logs
docker compose logs -f rag-apiQuery the RAG system with a natural-language question.
Request Body
{
"query": "What is LangChain?"
}Response (200 OK)
{
"answer": "LangChain is a framework for developing applications powered by large language models (LLMs)...",
"sources": [
{
"page_content": "LangChain is a framework for developing applications powered by...",
"metadata": {
"source": "langchain_docs.txt",
"topic": "LangChain"
}
}
],
"query": "What is LangChain?"
}Error Responses
| Status | Condition |
|---|---|
| 400 | Empty or invalid query |
| 500 | Internal pipeline error |
| 502 | Groq LLM failure (e.g. invalid API key, rate limit) |
| 503 | FAISS index not found or vector store unavailable |
Liveness probe for orchestrators.
Response (200 OK)
{
"status": "healthy",
"app_name": "End-to-End RAG API",
"version": "1.0.0"
}# Health check
curl http://localhost:8000/health
# RAG query
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "What is FAISS and how does it work?"}'import httpx
response = httpx.post(
"http://localhost:8000/api/v1/query",
json={"query": "Explain RAG in simple terms."},
)
print(response.json())All settings are loaded from environment variables or .env:
| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
Required | Your Groq API key |
LLM_MODEL |
groq/compound |
Groq model name |
LLM_TEMPERATURE |
0.0 |
LLM temperature (0 = deterministic) |
LLM_MAX_TOKENS |
1024 |
Maximum tokens in LLM response |
EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
HuggingFace embedding model |
VECTOR_DB_PATH |
faiss_index |
Path to FAISS index directory |
RETRIEVER_K |
4 |
Number of documents to retrieve per query |
API_PREFIX |
/api/v1 |
URL prefix for API routes |
DEBUG |
false |
Enable debug logging |
APP_NAME |
End-to-End RAG API |
Application name |
| Concern | Solution |
|---|---|
| Cold start | RAGService eagerly initialised at startup via lifespan |
| LLM reuse | @lru_cache singleton for ChatGroq |
| Vector store reuse | Thread-safe singleton with double-checked locking |
| Async I/O | Async FastAPI handlers + ainvoke for Groq calls |
| Modularity | Clean separation: api / rag / vectorstore / services / core |
| Error isolation | Domain exception hierarchy → HTTP response mapping |
| Observability | Structured logging with request-ID and timing headers |
| Config | 12-factor app: all config via environment variables |
- All secrets stored in
.envonly — never committed to VCS .envlisted in.gitignore- Docker image runs as non-root
raguser - CORS configurable via
CORS_ORIGINSenv var - Request IDs in headers for tracing
End-to-End RAG System/
├── app/
│ ├── __init__.py
│ ├── main.py ← App factory + middleware + exception handlers
│ ├── config.py ← Pydantic Settings
│ ├── dependencies.py ← FastAPI DI wiring
│ │
│ ├── api/
│ │ ├── __init__.py
│ │ ├── routes.py ← POST /api/v1/query
│ │ └── health.py ← GET /health
│ │
│ ├── rag/
│ │ ├── __init__.py
│ │ ├── pipeline.py ← Orchestration
│ │ ├── retriever.py ← FAISS retrieval
│ │ ├── generator.py ← Groq LLM
│ │ └── prompts.py ← Prompt templates
│ │
│ ├── vectorstore/
│ │ ├── __init__.py
│ │ └── vectordb.py ← FAISS management
│ │
│ ├── core/
│ │ ├── __init__.py
│ │ ├── logging.py ← Structured logging
│ │ └── exceptions.py ← Domain exceptions
│ │
│ ├── schemas/
│ │ ├── __init__.py
│ │ ├── request.py ← QueryRequest
│ │ └── response.py ← QueryResponse, HealthResponse
│ │
│ └── services/
│ ├── __init__.py
│ └── rag_service.py ← Business logic
│
├── faiss_index/ ← Generated by ingest.py (gitignored)
├── ingest.py ← Data ingestion script
├── .env ← Secrets (gitignored)
├── .env.example ← Template
├── .gitignore
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── README.md
Built by Hasnain Yaqoob - AI Engineer