Add Lec 22-24 - Robot Learning Module

rjeye · rjeye · commit b653cc3e4b18 · 2025-06-17T12:44:27.000+05:30
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@ Sessions will be conducted by **graduate students and faculty** from the **RRC L
 [![Detailed Schedule + Topics](https://img.shields.io/badge/View%20Detailed%20Schedule%20%2B%20Topics-Google%20Sheets-34A853?logo=google-sheets&logoColor=white&style=flat-square)](https://docs.google.com/spreadsheets/d/1qjU-zWitD6S8JJlbWS90PVDoHJdfmojjqB4BuxkT4w8/edit?usp=sharing)
 
 
-| #  | Date         | Topic                             | Presenter(s)                       | Lecture Notes | Assignments |
+| #  | Date         | Topic                             | Presenter(s)                       | Lecture Notes | Assignments/Demos |
 |----|--------------|-----------------------------------|------------------------------------|---------------|-------------|
 | 1  | May 17, 2025 | Introduction                      | Prof. Madhava Krishna              | --            | --          |
 | 2  | May 19, 2025 | Linear Algebra & Probability      | Vishal |[Linear Algebra Resources](lectures/02-linear-algebra-probability/README.md) | [Linear Algebra Problem Set](lectures/02-linear-algebra-probability/lec-02-linear-algebra-problems.pdf) |
@@ -50,9 +50,9 @@ Sessions will be conducted by **graduate students and faculty** from the **RRC L
 | 19 | Jun 10, 2025 (PM) | Motion Planning - III        | Meet                     |     [Motion Planning - III Resources](lectures/19-motion-planning-3/README.md)           |   [🚀 Collision Cones & Velocity Obstacles Interactive Demo](https://roboticsiiith.github.io/summer-school-2025/demos/lec-19-collision-cones-vo/)|
 | 20 | Jun 11, 2025 | ROS - I                           | Tarun, Soham                       |    [ROS Deployment - I Resources](lectures/20-ros-deployment-1/README.md)       | 🎓 Capstone 1/2 <br> Robot Tele-operation <br> [![Start Project](https://img.shields.io/badge/Start-Project-blue?logo=ros&logoColor=white)](lectures/20-ros-deployment-1/README.md#-capstone-project---part-1)|
 | 21 | Jun 12, 2025 | ROS - II                          | Tarun, Soham                       | [ROS Deployment - II Resources](lectures/21-ros-deployment-2/README.md)       | 🎓 Capstone 2/2 <br> Autonomous Navigation <br> [![Launch](https://img.shields.io/badge/Start-Project-blue?logo=ros&logoColor=white)](lectures/21-ros-deployment-2/README.md#-capstone-project---part-2) |
-| 22 | Jun 13, 2025 | Reinforcement Learning            | Vishal             |               |             |
-| 23 | Jun 14, 2025 | Diffusion Models - Basics         | Anant              |               |     🚧 WIP        |
-| 24 | Jun 14, 2025 | Diffusion Models for Robotics     | Jayaram            |               |             |
+| 22 | Jun 13, 2025 | Reinforcement Learning            | Vishal, Tejas             |    [Reinforcement Learning Resources](lectures/22-reinforcement-learning/README.md)           |     [🧠 Policy Gradient & Actor-Critic Colab Walkthrough](https://colab.research.google.com/drive/1TWPHz3udlKqsdSyMvTiZG9Y5P7VrY3gH?usp=sharing)        |
+| 23 | Jun 14, 2025 | Diffusion Models - Basics         | Anant              |     [Diffusion Models - Basics Resources](lectures/23-diffusion-basics/README.md)          |     [DDPM & Stable Diffusion Walkthroughs](lectures/23-diffusion-basics/README.md#-assignment)       |
+| 24 | Jun 14, 2025 | Diffusion Models for Robotics     | Jayaram            |    [Diffusion Models for Robotics  Resources](lectures/24-diffusion-robotics/README.md)          |  [Diffusion Policy for Robot Manipulation Hands-On Colab](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing) <br> [🤗 HF Push Task Demo](https://huggingface.co/lerobot/diffusion_pusht)|
 
 📌 **Note:**  
 The schedule will be regularly updated with slides, reference materials, and coding assignments as sessions conclude. Stay tuned by clicking on **Watch** for this repository or subscribing to its RSS feed.
diff --git a/lectures/06-dynamics-control-2/README.md b/lectures/06-dynamics-control-2/README.md
@@ -47,4 +47,7 @@ Please raise doubts or engage in discussion on the **`#module-2-dynamics-control
 |----------------------------------|----------------------------------------------------------------------------------------|
 | Lecture Slides (Sarthak) - Controls - Introduction         | [lec-06-controls-introduction.pdf](./lec-06-controls-introduction.pdf)                    |
 | Lecture Slides (Astik) - Controls - PID, LQR        | [lec-06-controls-pid-lqr.pdf](./lec-06-controls-pid-lqr.pdf)                  |
+| **Modern Robotics: Mechanics, Planning, and Control** – Kevin M. Lynch & Frank C. Park (Northwestern University) | [![Textbook](https://img.shields.io/badge/Open-Textbook-blue?logo=readthedocs)](https://hades.mech.northwestern.edu/index.php/Modern_Robotics)<br>[![Videos](https://img.shields.io/badge/Watch-Lecture_Videos-red?logo=youtube&logoColor=white)](https://hades.mech.northwestern.edu/index.php/Modern_Robotics_Videos) |
+
+
 ---
diff --git a/lectures/22-reinforcement-learning/README.md b/lectures/22-reinforcement-learning/README.md
@@ -0,0 +1,100 @@
+# Lecture 22: Reinforcement Learning
+**Instructors:** Vishal, Tejas    
+**Date:** June 13, 2025 
+
+## 📖 Topics Covered:
+
+- **1. Why Reinforcement Learning?**
+  - Why is it hard to generate data for robots with frequently changing morphologies?
+  - Why are traditional approaches (e.g., explicit physics models, controllers) inefficient for skill learning?
+  - How does RL (and supervised learning) help bridge this gap?
+  - What is an example where RL enabled fast adaptation (e.g., quadrupeds using rapid motor adaptation)?
+
+- **2. RL Notation and Terminology**
+  - What are stochastic processes and the Markovian property?
+  - What is a Markov Decision Process (MDP), and how is it defined?
+
+- **3. Anatomy of the Reinforcement Learning Pipeline**
+  - How do we collect samples from the environment using the current policy?
+  - What does model fitting or sample evaluation involve?
+  - How is the policy improved based on evaluation?
+  - How do modern simulators and sim-to-real transfer help overcome sample collection bottlenecks?
+
+- **4. Policy Gradient Methods**
+
+  **4.1 Goal of RL**
+  - What is the objective function \( J(\theta) \) in RL?
+  - How does the formulation differ in finite vs. infinite horizon settings?
+  - Why is the goal to maximize expected return?
+
+  **4.2 Policy Gradient**
+  - How do we compute the gradient of the objective function?
+  - What is the REINFORCE trick and algorithm?
+
+- **5. Reducing Variance in REINFORCE**
+  - Why does REINFORCE have high variance despite being unbiased?
+  - How does the reward-to-go trick exploit causality to reduce variance?
+  - What are baseline methods for variance reduction?
+  - How do we choose an optimal baseline to minimize variance?
+  - What are actor-critic methods, and how do they combine value estimation with policy updates?
+
+- **6. Value-Based Methods**
+  - Value function and Q-function
+  - What are SARSA and Q-learning?
+  - How does Deep Q-Learning extend traditional Q-learning?
+
+
+## 📄 Assignment
+
+- 🧠 **Policy Gradient & Actor-Critic Walkthrough:**  
+   Open the following Colab notebook to implement and experiment with Policy Gradient methods from scratch:  
+   [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TWPHz3udlKqsdSyMvTiZG9Y5P7VrY3gH?usp=sharing)
+
+   This walkthrough is designed to help you implement a working **Policy Gradient agent** using PyTorch on environments like *CartPole*.
+
+   ---
+
+   **📚 What You'll Learn**
+   - Core ideas behind Policy Gradient algorithms
+   - How to implement and train a neural network policy
+   - How to collect rollouts and compute returns
+   - Policy updates using gradient ascent
+   - (Optional) Baseline methods & Generalized Advantage Estimation (GAE)
+
+   **🛠 Prerequisites**
+   - Python + PyTorch basics
+   - Key RL concepts: Policy, Reward, Return, Advantage, Value Function
+
+   **🗂 Notebook Structure**
+   - **Environment Setup**: Logging and configuration
+   - **Policy Network**: Implementation and sampling
+   - **Training Loop**: Computing returns and updating the policy
+   - **Variance Reduction (Optional)**: Baselines, GAE for stability
+
+   **👨‍🏫 Tips for Students**
+   - Run cells in order — don’t skip!
+   - Print out observations, actions, rewards to debug.
+   - Try different hyperparameters and Gym environments.
+   - Use TensorBoard or video logs to visualize progress.
+
+   > 📘 Inspired by [CS285: Deep RL (Berkeley)](https://rail.eecs.berkeley.edu/deeprlcourse/)
+   
+   _Courtesy: Tejas_
+
+📢 Do post doubts on the `#module-7-robot-learning` Slack channel!
+
+## 🔗 Resources
+
+| 📚 Topic | 🔗 Link |
+|----------|---------|
+|Lecture Slides -- Reinforcement Learning| See Lectures 4-7 from RAIL Course (linked below) |
+| 🎓 Deep Reinforcement Learning – Sergey Levine (RAIL, Berkeley) | [![Website](https://img.shields.io/badge/Open-Course-blue?logo=googleclassroom)](https://rail.eecs.berkeley.edu/deeprlcourse/) |
+| 🧠 Policy Gradient Algorithms – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2018-04-08-policy-gradient/) |
+| ⚙️ PPO Implementation Details – ICLR Blog Track | [![Blog](https://img.shields.io/badge/Read-PPO_Insights-orange?logo=readthedocs)](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) |
+| 📘 Mathematical Foundations of RL – Shiyu Zhao (Westlake University) | [![GitHub](https://img.shields.io/badge/View-on_GitHub-181717?logo=github)](https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning) |
+| ⚡ RL Quickstart Guide – Joseph Suarez (Pufferlib Creator) | [![X/Twitter](https://img.shields.io/badge/View-Quickstart_Guide-1DA1F2?logo=x)](https://x.com/jsuarez5341/status/1854855861295849793) |
+| 📦 Stable Baselines3 – RL Library (DLR-RM) | [![GitHub](https://img.shields.io/badge/View-Stable--Baselines3-181717?logo=github)](https://github.com/DLR-RM/stable-baselines3) |
+| 🧼 CleanRL – Minimal RL Implementations | [![GitHub](https://img.shields.io/badge/View-CleanRL-181717?logo=github)](https://github.com/vwxyzjn/cleanrl) |
+| 🐉 Decisions & Dragons – FAQs About RL | [![Website](https://img.shields.io/badge/Explore-Decisions_&_Dragons-blueviolet?logo=readthedocs)](https://www.decisionsanddragons.com/) |
+
+---
diff --git a/lectures/23-diffusion-basics/README.md b/lectures/23-diffusion-basics/README.md
@@ -0,0 +1,59 @@
+# Lecture 23: Diffusion Models Basics
+**Instructor:** Anant Garg    
+**Date:** June 14, 2025
+
+## Topics Covered:
+
+- Noise, Gaussians + Setup
+- Autoencoders, VAE, Reparameterization Trick
+- The Forward Process: Adding Noise Step-by-Step  
+- The Reverse Process: Learning to Denoise  
+- DDPM: Predicting Noise to Reconstruct Data  
+- Guidance: Making Diffusion Outputs Useful  
+  - Classifier-Based
+  - Classifier-Free
+- Score Matching
+- Latent Diffusion
+
+## 📄 Assignment
+
+- 🎨 **Diffusion Models – DDPM & Stable Diffusion Walkthroughs:**  
+   Clone and run through the following two PyTorch implementations to understand the fundamentals of diffusion models:
+
+   - 📦 **DDPM (Denoising Diffusion Probabilistic Models):**  
+     [explainingai-code/DDPM-PyTorch](https://github.com/explainingai-code/DDPM-Pytorch)
+
+     This repo walks through the original DDPM algorithm in PyTorch. Run the code, visualize the forward and reverse diffusion process, and study how noise schedules influence generation.
+
+     🔁 **Extension Task – Implement DDIM:**  
+     Read [DDIM paper (arXiv:2010.02502)](https://arxiv.org/abs/2010.02502) and extend the code to include deterministic sampling via DDIM.  
+     Suggested steps:
+     - Modify the sampling loop to use DDIM's non-Markovian formulation
+     - Add support for fewer inference steps (fast sampling)
+     - Compare image quality vs sampling speed with DDPM
+
+   - 🎨 **Stable Diffusion (from scratch):**  
+     [explainingai-code/StableDiffusion-PyTorch](https://github.com/explainingai-code/StableDiffusion-PyTorch)
+
+     This repo walks through a simplified but faithful re-implementation of Stable Diffusion.  
+     Explore how text prompts are encoded, how the UNet denoiser operates, and how the latent diffusion process differs from vanilla DDPM.
+
+💡 Feel free to experiment with prompts, noise schedules, and decoder resolutions! Post all your findings, doubts on the `#module-7-robot-learning` Slack channel.
+
+## 🔗 Resources
+
+| 📚 Topic | 🔗 Link |
+|----------|--------|
+| 📑 Lecture Slides – Diffusion Basics | [![PDF](https://img.shields.io/badge/Open-Slides-red?logo=adobeacrobatreader&logoColor=white)](./lec-23-diffusion-basics.pdf) |
+| 🧠 From Autoencoder to Beta-VAE – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2018-08-12-vae/) |
+| 🌫️ What Are Diffusion Models? – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/) |
+| 📄 DDPM – Denoising Diffusion Probabilistic Models (Ho et al.) | [![PDF](https://img.shields.io/badge/Open-Paper-blue?logo=readthedocs)](https://hojonathanho.github.io/diffusion/) |
+| 🧬 Latent Diffusion Models – High-Res Image Synthesis | [![arXiv](https://img.shields.io/badge/arXiv-2112.10752-b31b1b?logo=arxiv)](https://arxiv.org/pdf/2112.10752) |
+| 🎥 Explaining Diffusion – YouTube Playlist | [![YouTube](https://img.shields.io/badge/Watch-Playlist-red?logo=youtube&logoColor=white)](https://www.youtube.com/playlist?list=PL8VDJoEXIjpo2S7X-1YKZnbHyLGyESDCe) |
+| 🧪 Stable Diffusion (from scratch) – PyTorch Codebase | [![GitHub](https://img.shields.io/badge/View-Code-181717?logo=github)](https://github.com/explainingai-code/StableDiffusion-PyTorch) |
+| 🌊 Introduction to Flow Matching & Diffusion Models – MIT 6.S184 (Generative AI with SDEs) | [![Website](https://img.shields.io/badge/Open-Course-blue?logo=mit&logoColor=white)](https://diffusion.csail.mit.edu/) |
+| 🎥 Diffusion Models – Paper Explanation & Math | [![YouTube](https://img.shields.io/badge/Watch-Video-red?logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=HoKDTa5jHvg) |
+| 🎓 CS 198-126: Lecture 12 – Diffusion Models (ML@Berkeley) | [![YouTube](https://img.shields.io/badge/Watch-Lecture-red?logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=687zEGODmHA&t=23s) |
+
+
+---
diff --git a/lectures/23-diffusion-basics/lec-23-diffusion-basics.pdf b/lectures/23-diffusion-basics/lec-23-diffusion-basics.pdf
diff --git a/lectures/24-diffusion-robotics/README.md b/lectures/24-diffusion-robotics/README.md
@@ -0,0 +1,52 @@
+# Lecture 24: Diffusion Models for Robotics
+**Instructor:** Jayaram Reddy    
+**Date:** June 14, 2025
+
+## Topics Covered:
+- Why Diffusion for Control
+- Diffusion Policies
+- Diffusion for Motion Planning, EDMP
+- Diffusion for World-Modeling
+- Tradeoffs Autoregressive vs Diffusion Models
+- Latent Diffusion
+
+## 📄 Assignments
+
+- 🤖 **Diffusion Policy for Robot Manipulation:**  
+   Explore how diffusion models can be applied to learn robotic manipulation behaviors, such as pushing, directly from demonstrations.
+
+   - 📓 **Official Colab Notebook:**  
+     [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing)
+
+   - 🤗 **Hugging Face Playground – Push Task Demo:**  
+     [![Open on Hugging Face](https://img.shields.io/badge/Launch-HF_Notebook-blueviolet?logo=huggingface&logoColor=white)](https://huggingface.co/lerobot/diffusion_pusht)
+
+   **What to Do:**
+   - Run the official notebook to understand the structure of the Diffusion Policy model and how it leverages conditional generation for trajectory prediction.
+   - Try out the push environment on Hugging Face to see the learned policy in action.
+   - Reflect on how diffusion-based imitation compares with classical behavioral cloning.
+   - (Optional) Try swapping out the dataset or varying inference steps to observe differences in performance.
+
+   > 🧠 This exercise builds intuition for how generative models can drive robotic agents with flexibility and generalization.
+
+📢 Please feel free to post all questions over at the `#module-7-robot-learning` Slack channel.
+
+## 🔗 Resources
+
+| 📚 Topic | 🔗 Link |
+|----------|--------|
+| 📑 Lecture Slides – Diffusion for Robotics | [![PDF](https://img.shields.io/badge/Open-PDF-red?logo=adobeacrobatreader&logoColor=white)](./lec-24-diffusion-robotics.pdf) [![Slides](https://img.shields.io/badge/Open-Google_Slides-yellow?logo=googleslides&logoColor=white)](https://docs.google.com/presentation/d/1YjRIxj32OXhiaPgXihWKW40aPGhIcgBFr4CPkJLY19Q/edit?usp=sharing) |
+| 🤖 Diffusion Policy – Columbia University | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://diffusion-policy.cs.columbia.edu/) |
+| 🧠 Imitating Human Behavior with Diffusion Models (2023) | [![arXiv](https://img.shields.io/badge/arXiv-2301.10677-b31b1b?logo=arxiv)](https://arxiv.org/abs/2301.10677) |
+| 📐 Geometry of Diffusion Models for Robotics – Sander Dieleman | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://sander.ai/2023/08/28/geometry.html) |
+| 🧩 Ensemble of Costs for Diffusion Planning | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://ensemble-of-costs-diffusion.github.io/) |
+| 💎 DIAMOND – Diffusion Models for Diverse Robot Behavior | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://diamond-wm.github.io/) |
+| 🚗 Imagine2Drive – Open Vocabulary Driving Skills | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://anantagrg.github.io/Imagine-2-Drive.github.io/) |
+| 🧞 GENIE (Diffusion + LLMs) – Google DeepMind | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://sites.google.com/view/genie-2024/home) |
+| 🌌 DreamGen – Scene-Level Robot Imagination (NVIDIA) | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://research.nvidia.com/labs/gear/dreamgen/) |
+| 🌀 Diffusion Forcing – Diffusion Models for Human Motion Imitation | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://boyuan.space/diffusion-forcing/) |
+| 🔄 Flow Matching – Machine Learning Group, University of Cambridge | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html) |
+
+
+
+---
diff --git a/lectures/24-diffusion-robotics/lec-24-diffusion-robotics.pdf b/lectures/24-diffusion-robotics/lec-24-diffusion-robotics.pdf