🧠 AI System Design Guide

The Complete Interview & Production Reference

The living reference for production AI systems. Continuously updated. Interview-ready depth.

📚 Quick Navigation

I want to...	Start here
Prepare for interviews	Question Bank → Answer Frameworks
Learn AI systems fast	LLM Internals → RAG Fundamentals
Build production RAG	Chunking → Vector DBs → Reranking
Design multi-tenant AI	Isolation Patterns → Case Study
Build agents	Agent Fundamentals → MCP → LangGraph

🎯 Why This Guide

Traditional books are outdated before they ship. This is a living document: when new models release, when patterns evolve, this updates.

This Guide	Printed Books
December 2025 models (GPT-5.2, Claude Opus 4.5, Gemini 3)	Stuck on GPT-4
MCP, Agentic RAG, Flow Engineering	Does not exist
Real pricing with verification dates	Already wrong
Staff-level interview Q&A	Generic questions

📖 Guide Structure

├── 00-interview-prep/           # Questions, frameworks, exercises
├── 01-foundations/              # Transformers, attention, embeddings
├── 02-model-landscape/          # GPT-5.2, Claude Opus 4.5, Gemini 3, o3, DeepSeek
├── 03-training-and-adaptation/  # Fine-tuning, LoRA, DPO, distillation
├── 04-inference-optimization/   # KV cache, PagedAttention, vLLM
├── 05-prompting-and-context/    # CoT, DSPy, prompt injection defense
├── 06-retrieval-systems/        # RAG, chunking, GraphRAG, Agentic RAG
├── 07-agentic-systems/          # MCP, multi-agent, swarms, evaluation
├── 08-memory-and-state/         # L1-L3 memory tiers, Mem0, caching
├── 09-frameworks-and-tools/     # LangGraph, DSPy, LlamaIndex
├── 10-document-processing/      # Vision-LLM OCR, multimodal parsing
├── 11-infrastructure-and-mlops/ # GPU clusters, LLMOps, cost management
├── 12-security-and-access/      # RBAC, ABAC, multi-tenant isolation
├── 13-reliability-and-safety/   # Guardrails, red-teaming
├── 14-evaluation-and-observability/ # RAGAS, LangSmith, drift detection
├── 15-ai-design-patterns/       # Pattern catalog, anti-patterns
├── 16-case-studies/             # Real-world architectures with diagrams
└── GLOSSARY.md                  # Every term defined

🔥 Featured Case Studies

Real interview problems with complete solutions and diagrams:

Case Study	Problem	Key Patterns
Real-Time Search	5-minute data freshness at scale	Streaming + Hybrid Search
Coding Agent	Autonomous multi-file changes	Sandboxing + Self-Correction
Multi-Tenant SaaS	Coca-Cola and Pepsi on same infra	Defense-in-Depth Isolation
Customer Support	60% auto-resolution rate	Tiered Routing + Escalation
Document Intelligence	50K contracts/month extraction	Vision-LLM + Parallel Extractors
Recommendation Engine	Personalized explanations at 50M users	ML Ranking + LLM Explanations
Compliance Automation	FDA regulation pre-screening	Claim Extraction + Precedent DB
Voice Healthcare	Real-time clinical note generation	On-Prem ASR + HIPAA
Fraud Detection	100ms decision with explainability	ML + Rules Hybrid
Knowledge Management	2M docs with access control	Permission-Aware RAG

🎓 For Interview Prep

AI system design interviews ask questions like:

"Design a multi-tenant RAG system where competitors cannot see each other's data."

"Your agent takes 15 steps for a 3-step task. How do you debug it?"

This guide gives you concrete patterns, real tradeoffs, and production failure modes: the depth interviewers expect at senior levels.

➡️ Start with Interview Prep

🔄 Living Book

This guide tracks:

New model releases and real-world performance
Emerging patterns (MCP, Agentic RAG, Flow Engineering)
Updated pricing and rate limits
Deprecations and best practice changes

⭐ Star and Watch to get notified when updates are pushed.

🤝 Contributing

Found outdated info? Have production experience to share? PRs welcome. See Contributing Guide.

📄 License

MIT License. See LICENSE.

Built by Om Bharatiya

Last updated: December 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 AI System Design Guide

The Complete Interview & Production Reference

📚 Quick Navigation

🎯 Why This Guide

📖 Guide Structure

🔥 Featured Case Studies

🎓 For Interview Prep

🔄 Living Book

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
00-interview-prep		00-interview-prep
01-foundations		01-foundations
02-model-landscape		02-model-landscape
03-training-and-adaptation		03-training-and-adaptation
04-inference-optimization		04-inference-optimization
05-prompting-and-context		05-prompting-and-context
06-retrieval-systems		06-retrieval-systems
07-agentic-systems		07-agentic-systems
08-memory-and-state		08-memory-and-state
09-frameworks-and-tools		09-frameworks-and-tools
10-document-processing		10-document-processing
11-infrastructure-and-mlops		11-infrastructure-and-mlops
12-security-and-access		12-security-and-access
13-reliability-and-safety		13-reliability-and-safety
14-evaluation-and-observability		14-evaluation-and-observability
15-ai-design-patterns		15-ai-design-patterns
16-case-studies		16-case-studies
.gitignore		.gitignore
GLOSSARY.md		GLOSSARY.md
PATTERNS.md		PATTERNS.md
README.md		README.md
ai_evals_complete_guide_langwatch_langfuse.md		ai_evals_complete_guide_langwatch_langfuse.md
ai_evals_comprehensive_study_guide.md		ai_evals_comprehensive_study_guide.md

ombharatiya/ai-system-design-guide

Folders and files

Latest commit

History

Repository files navigation

🧠 AI System Design Guide

The Complete Interview & Production Reference

📚 Quick Navigation

🎯 Why This Guide

📖 Guide Structure

🔥 Featured Case Studies

🎓 For Interview Prep

🔄 Living Book

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages