Skip to content

AI system design guide for engineers building production AI systems and evals.

Notifications You must be signed in to change notification settings

ombharatiya/ai-system-design-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 AI System Design Guide

The Complete Interview & Production Reference

GitHub Twitter LinkedIn

Last Updated License PRs Welcome Stars

The living reference for production AI systems. Continuously updated. Interview-ready depth.


📚 Quick Navigation

I want to... Start here
Prepare for interviews Question BankAnswer Frameworks
Learn AI systems fast LLM InternalsRAG Fundamentals
Build production RAG ChunkingVector DBsReranking
Design multi-tenant AI Isolation PatternsCase Study
Build agents Agent FundamentalsMCPLangGraph

🎯 Why This Guide

Traditional books are outdated before they ship. This is a living document: when new models release, when patterns evolve, this updates.

This Guide Printed Books
December 2025 models (GPT-5.2, Claude Opus 4.5, Gemini 3) Stuck on GPT-4
MCP, Agentic RAG, Flow Engineering Does not exist
Real pricing with verification dates Already wrong
Staff-level interview Q&A Generic questions

📖 Guide Structure

├── 00-interview-prep/           # Questions, frameworks, exercises
├── 01-foundations/              # Transformers, attention, embeddings
├── 02-model-landscape/          # GPT-5.2, Claude Opus 4.5, Gemini 3, o3, DeepSeek
├── 03-training-and-adaptation/  # Fine-tuning, LoRA, DPO, distillation
├── 04-inference-optimization/   # KV cache, PagedAttention, vLLM
├── 05-prompting-and-context/    # CoT, DSPy, prompt injection defense
├── 06-retrieval-systems/        # RAG, chunking, GraphRAG, Agentic RAG
├── 07-agentic-systems/          # MCP, multi-agent, swarms, evaluation
├── 08-memory-and-state/         # L1-L3 memory tiers, Mem0, caching
├── 09-frameworks-and-tools/     # LangGraph, DSPy, LlamaIndex
├── 10-document-processing/      # Vision-LLM OCR, multimodal parsing
├── 11-infrastructure-and-mlops/ # GPU clusters, LLMOps, cost management
├── 12-security-and-access/      # RBAC, ABAC, multi-tenant isolation
├── 13-reliability-and-safety/   # Guardrails, red-teaming
├── 14-evaluation-and-observability/ # RAGAS, LangSmith, drift detection
├── 15-ai-design-patterns/       # Pattern catalog, anti-patterns
├── 16-case-studies/             # Real-world architectures with diagrams
└── GLOSSARY.md                  # Every term defined

🔥 Featured Case Studies

Real interview problems with complete solutions and diagrams:

Case Study Problem Key Patterns
Real-Time Search 5-minute data freshness at scale Streaming + Hybrid Search
Coding Agent Autonomous multi-file changes Sandboxing + Self-Correction
Multi-Tenant SaaS Coca-Cola and Pepsi on same infra Defense-in-Depth Isolation
Customer Support 60% auto-resolution rate Tiered Routing + Escalation
Document Intelligence 50K contracts/month extraction Vision-LLM + Parallel Extractors
Recommendation Engine Personalized explanations at 50M users ML Ranking + LLM Explanations
Compliance Automation FDA regulation pre-screening Claim Extraction + Precedent DB
Voice Healthcare Real-time clinical note generation On-Prem ASR + HIPAA
Fraud Detection 100ms decision with explainability ML + Rules Hybrid
Knowledge Management 2M docs with access control Permission-Aware RAG

🎓 For Interview Prep

AI system design interviews ask questions like:

"Design a multi-tenant RAG system where competitors cannot see each other's data."

"Your agent takes 15 steps for a 3-step task. How do you debug it?"

This guide gives you concrete patterns, real tradeoffs, and production failure modes: the depth interviewers expect at senior levels.

➡️ Start with Interview Prep


🔄 Living Book

This guide tracks:

  • New model releases and real-world performance
  • Emerging patterns (MCP, Agentic RAG, Flow Engineering)
  • Updated pricing and rate limits
  • Deprecations and best practice changes

⭐ Star and Watch to get notified when updates are pushed.


🤝 Contributing

Found outdated info? Have production experience to share? PRs welcome. See Contributing Guide.


📄 License

MIT License. See LICENSE.


Built by Om Bharatiya
GitHub Twitter LinkedIn

Last updated: December 2025