A structured path through 37 papers that matter. Each paper links to both the original PDF and our multi-perspective analysis.
Before papers, build visual intuition:
| Resource | What You'll Learn |
|---|---|
| 3Blue1Brown Neural Networks | How neural nets actually work |
| Andrej Karpathy: Zero to Hero | Build intuition through code |
Everything builds on this.
| Paper | Analysis | Why It Matters |
|---|---|---|
| Transformers ⭐ | Read → | The architecture behind GPT, Claude, Gemini |
The insight that drove the AI explosion.
| Paper | Analysis | Why It Matters |
|---|---|---|
| Scaling Laws | Read → | Why bigger models keep getting better |
| GPT-3 ⭐ | Read → | The paper that proved scaling works |
| Training Compute-Optimal LLMs | Read → | Chinchilla - training efficiently |
Making models helpful, not just capable.
| Paper | Analysis | Why It Matters |
|---|---|---|
| RLHF ⭐ | Read → | How ChatGPT became ChatGPT |
| DPO | Read → | Simpler alternative to RLHF |
Teaching models to think.
| Paper | Analysis | Why It Matters |
|---|---|---|
| Chain of Thought ⭐ | Read → | "Let's think step by step" |
| Tree of Thoughts | Read → | Exploring multiple reasoning paths |
| Graph of Thoughts | Read → | Network-based reasoning |
| Meta-CoT | Read → | Meta-learning for chain of thought |
| Self-Refine | Read → | Iterative self-improvement |
| Let's Verify Step by Step | Read → | Process beats outcome |
| DeepSeek R1 ⭐ | Read → | Pure RL for reasoning |
Models that take action.
| Paper | Analysis | Why It Matters |
|---|---|---|
| ReAct | Read → | Reasoning + Acting interleaved |
| Toolformer | Read → | Teaching LLMs to use tools |
| SWE-Agent | Read → | AI that fixes real GitHub issues |
| OpenHands | Read → | Open-source coding agent |
Inside frontier models.
| Paper | Analysis | Why It Matters |
|---|---|---|
| GPT-4 | Read → | OpenAI's multimodal flagship |
| Llama 3 ⭐ | Read → | Most transparent frontier model |
| Gemini 1.5 | Read → | 10M context, multimodal |
| MoE | Read → | Mixture of Experts architecture |
How we measure progress.
| Paper | Analysis | Why It Matters |
|---|---|---|
| BIG-Bench | Read → | 200+ diverse capability tasks |
| SWE-Bench | Read → | Real-world software engineering |
| Chatbot Arena | Read → | Live human preference rankings |
| ARC-Prize | Read → | Testing general reasoning |
| Survey | Analysis | Coverage |
|---|---|---|
| Foundations of LLMs | Read → | Comprehensive theoretical foundations |
| LLM Survey | Read → | Complete landscape of LLMs |
| Agent Survey | Read → | Autonomous AI agents |
| Prompt Engineering Survey | Read → | Every prompting technique |
| Paper | Analysis | Why It Matters |
|---|---|---|
| AlphaZero | Read → | Self-play mastery |
| MuZero ⭐ | Read → | Learning without rules |
| Paper | Analysis | Why It Matters |
|---|---|---|
| LoRA | Read → | Efficient fine-tuning |
| LLM-as-Judge | Read → | AI evaluating AI |
| Paper | Analysis | Why It Matters |
|---|---|---|
| Vision Transformer | Read → | Transformers for images |
| Latent Diffusion | Read → | Foundation of Stable Diffusion |
| Resource | Analysis | What It Offers |
|---|---|---|
| History of Deep Learning | Read → | Timeline of breakthroughs |
| Resource | Best For |
|---|---|
| Yannic Kilcher | Paper walkthroughs |
| Stanford: Building LLMs | Academic depth |
| Noam Brown on Planning | o1 founder on AI planning |
| Resource | Best For |
|---|---|
| Build an LLM from Scratch | Hands-on understanding |
| Full Stack Deep Learning | Production AI systems |