The living reference for production AI systems. Continuously updated. Interview-ready depth.
| I want to... | Start here |
|---|---|
| Prepare for interviews | Question Bank → Answer Frameworks |
| Learn AI systems fast | LLM Internals → RAG Fundamentals |
| Build production RAG | Chunking → Vector DBs → Reranking |
| Design multi-tenant AI | Isolation Patterns → Case Study |
| Build agents | Agent Fundamentals → MCP → LangGraph |
Traditional books are outdated before they ship. This is a living document: when new models release, when patterns evolve, this updates.
| This Guide | Printed Books |
|---|---|
| December 2025 models (GPT-5.2, Claude Opus 4.5, Gemini 3) | Stuck on GPT-4 |
| MCP, Agentic RAG, Flow Engineering | Does not exist |
| Real pricing with verification dates | Already wrong |
| Staff-level interview Q&A | Generic questions |
├── 00-interview-prep/ # Questions, frameworks, exercises
├── 01-foundations/ # Transformers, attention, embeddings
├── 02-model-landscape/ # GPT-5.2, Claude Opus 4.5, Gemini 3, o3, DeepSeek
├── 03-training-and-adaptation/ # Fine-tuning, LoRA, DPO, distillation
├── 04-inference-optimization/ # KV cache, PagedAttention, vLLM
├── 05-prompting-and-context/ # CoT, DSPy, prompt injection defense
├── 06-retrieval-systems/ # RAG, chunking, GraphRAG, Agentic RAG
├── 07-agentic-systems/ # MCP, multi-agent, swarms, evaluation
├── 08-memory-and-state/ # L1-L3 memory tiers, Mem0, caching
├── 09-frameworks-and-tools/ # LangGraph, DSPy, LlamaIndex
├── 10-document-processing/ # Vision-LLM OCR, multimodal parsing
├── 11-infrastructure-and-mlops/ # GPU clusters, LLMOps, cost management
├── 12-security-and-access/ # RBAC, ABAC, multi-tenant isolation
├── 13-reliability-and-safety/ # Guardrails, red-teaming
├── 14-evaluation-and-observability/ # RAGAS, LangSmith, drift detection
├── 15-ai-design-patterns/ # Pattern catalog, anti-patterns
├── 16-case-studies/ # Real-world architectures with diagrams
└── GLOSSARY.md # Every term defined
Real interview problems with complete solutions and diagrams:
| Case Study | Problem | Key Patterns |
|---|---|---|
| Real-Time Search | 5-minute data freshness at scale | Streaming + Hybrid Search |
| Coding Agent | Autonomous multi-file changes | Sandboxing + Self-Correction |
| Multi-Tenant SaaS | Coca-Cola and Pepsi on same infra | Defense-in-Depth Isolation |
| Customer Support | 60% auto-resolution rate | Tiered Routing + Escalation |
| Document Intelligence | 50K contracts/month extraction | Vision-LLM + Parallel Extractors |
| Recommendation Engine | Personalized explanations at 50M users | ML Ranking + LLM Explanations |
| Compliance Automation | FDA regulation pre-screening | Claim Extraction + Precedent DB |
| Voice Healthcare | Real-time clinical note generation | On-Prem ASR + HIPAA |
| Fraud Detection | 100ms decision with explainability | ML + Rules Hybrid |
| Knowledge Management | 2M docs with access control | Permission-Aware RAG |
AI system design interviews ask questions like:
"Design a multi-tenant RAG system where competitors cannot see each other's data."
"Your agent takes 15 steps for a 3-step task. How do you debug it?"
This guide gives you concrete patterns, real tradeoffs, and production failure modes: the depth interviewers expect at senior levels.
➡️ Start with Interview Prep
This guide tracks:
- New model releases and real-world performance
- Emerging patterns (MCP, Agentic RAG, Flow Engineering)
- Updated pricing and rate limits
- Deprecations and best practice changes
⭐ Star and Watch to get notified when updates are pushed.
Found outdated info? Have production experience to share? PRs welcome. See Contributing Guide.
MIT License. See LICENSE.
Built by Om Bharatiya
Last updated: December 2025