Skip to content

Commit b653cc3

Browse files
committed
Add Lec 22-24 - Robot Learning Module
1 parent 3398886 commit b653cc3

File tree

7 files changed

+218
-4
lines changed

7 files changed

+218
-4
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Sessions will be conducted by **graduate students and faculty** from the **RRC L
2727
[![Detailed Schedule + Topics](https://img.shields.io/badge/View%20Detailed%20Schedule%20%2B%20Topics-Google%20Sheets-34A853?logo=google-sheets&logoColor=white&style=flat-square)](https://docs.google.com/spreadsheets/d/1qjU-zWitD6S8JJlbWS90PVDoHJdfmojjqB4BuxkT4w8/edit?usp=sharing)
2828

2929

30-
| # | Date | Topic | Presenter(s) | Lecture Notes | Assignments |
30+
| # | Date | Topic | Presenter(s) | Lecture Notes | Assignments/Demos |
3131
|----|--------------|-----------------------------------|------------------------------------|---------------|-------------|
3232
| 1 | May 17, 2025 | Introduction | Prof. Madhava Krishna | -- | -- |
3333
| 2 | May 19, 2025 | Linear Algebra & Probability | Vishal |[Linear Algebra Resources](lectures/02-linear-algebra-probability/README.md) | [Linear Algebra Problem Set](lectures/02-linear-algebra-probability/lec-02-linear-algebra-problems.pdf) |
@@ -50,9 +50,9 @@ Sessions will be conducted by **graduate students and faculty** from the **RRC L
5050
| 19 | Jun 10, 2025 (PM) | Motion Planning - III | Meet | [Motion Planning - III Resources](lectures/19-motion-planning-3/README.md) | [🚀 Collision Cones & Velocity Obstacles Interactive Demo](https://roboticsiiith.github.io/summer-school-2025/demos/lec-19-collision-cones-vo/)|
5151
| 20 | Jun 11, 2025 | ROS - I | Tarun, Soham | [ROS Deployment - I Resources](lectures/20-ros-deployment-1/README.md) | 🎓 Capstone 1/2 <br> Robot Tele-operation <br> [![Start Project](https://img.shields.io/badge/Start-Project-blue?logo=ros&logoColor=white)](lectures/20-ros-deployment-1/README.md#-capstone-project---part-1)|
5252
| 21 | Jun 12, 2025 | ROS - II | Tarun, Soham | [ROS Deployment - II Resources](lectures/21-ros-deployment-2/README.md) | 🎓 Capstone 2/2 <br> Autonomous Navigation <br> [![Launch](https://img.shields.io/badge/Start-Project-blue?logo=ros&logoColor=white)](lectures/21-ros-deployment-2/README.md#-capstone-project---part-2) |
53-
| 22 | Jun 13, 2025 | Reinforcement Learning | Vishal | | |
54-
| 23 | Jun 14, 2025 | Diffusion Models - Basics | Anant | | 🚧 WIP |
55-
| 24 | Jun 14, 2025 | Diffusion Models for Robotics | Jayaram | | |
53+
| 22 | Jun 13, 2025 | Reinforcement Learning | Vishal, Tejas | [Reinforcement Learning Resources](lectures/22-reinforcement-learning/README.md) | [🧠 Policy Gradient & Actor-Critic Colab Walkthrough](https://colab.research.google.com/drive/1TWPHz3udlKqsdSyMvTiZG9Y5P7VrY3gH?usp=sharing) |
54+
| 23 | Jun 14, 2025 | Diffusion Models - Basics | Anant | [Diffusion Models - Basics Resources](lectures/23-diffusion-basics/README.md) | [DDPM & Stable Diffusion Walkthroughs](lectures/23-diffusion-basics/README.md#-assignment) |
55+
| 24 | Jun 14, 2025 | Diffusion Models for Robotics | Jayaram | [Diffusion Models for Robotics Resources](lectures/24-diffusion-robotics/README.md) | [Diffusion Policy for Robot Manipulation Hands-On Colab](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing) <br> [🤗 HF Push Task Demo](https://huggingface.co/lerobot/diffusion_pusht)|
5656

5757
📌 **Note:**
5858
The schedule will be regularly updated with slides, reference materials, and coding assignments as sessions conclude. Stay tuned by clicking on **Watch** for this repository or subscribing to its RSS feed.

lectures/06-dynamics-control-2/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,7 @@ Please raise doubts or engage in discussion on the **`#module-2-dynamics-control
4747
|----------------------------------|----------------------------------------------------------------------------------------|
4848
| Lecture Slides (Sarthak) - Controls - Introduction | [lec-06-controls-introduction.pdf](./lec-06-controls-introduction.pdf) |
4949
| Lecture Slides (Astik) - Controls - PID, LQR | [lec-06-controls-pid-lqr.pdf](./lec-06-controls-pid-lqr.pdf) |
50+
| **Modern Robotics: Mechanics, Planning, and Control** – Kevin M. Lynch & Frank C. Park (Northwestern University) | [![Textbook](https://img.shields.io/badge/Open-Textbook-blue?logo=readthedocs)](https://hades.mech.northwestern.edu/index.php/Modern_Robotics)<br>[![Videos](https://img.shields.io/badge/Watch-Lecture_Videos-red?logo=youtube&logoColor=white)](https://hades.mech.northwestern.edu/index.php/Modern_Robotics_Videos) |
51+
52+
5053
---
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Lecture 22: Reinforcement Learning
2+
**Instructors:** Vishal, Tejas
3+
**Date:** June 13, 2025
4+
5+
## 📖 Topics Covered:
6+
7+
- **1. Why Reinforcement Learning?**
8+
- Why is it hard to generate data for robots with frequently changing morphologies?
9+
- Why are traditional approaches (e.g., explicit physics models, controllers) inefficient for skill learning?
10+
- How does RL (and supervised learning) help bridge this gap?
11+
- What is an example where RL enabled fast adaptation (e.g., quadrupeds using rapid motor adaptation)?
12+
13+
- **2. RL Notation and Terminology**
14+
- What are stochastic processes and the Markovian property?
15+
- What is a Markov Decision Process (MDP), and how is it defined?
16+
17+
- **3. Anatomy of the Reinforcement Learning Pipeline**
18+
- How do we collect samples from the environment using the current policy?
19+
- What does model fitting or sample evaluation involve?
20+
- How is the policy improved based on evaluation?
21+
- How do modern simulators and sim-to-real transfer help overcome sample collection bottlenecks?
22+
23+
- **4. Policy Gradient Methods**
24+
25+
**4.1 Goal of RL**
26+
- What is the objective function \( J(\theta) \) in RL?
27+
- How does the formulation differ in finite vs. infinite horizon settings?
28+
- Why is the goal to maximize expected return?
29+
30+
**4.2 Policy Gradient**
31+
- How do we compute the gradient of the objective function?
32+
- What is the REINFORCE trick and algorithm?
33+
34+
- **5. Reducing Variance in REINFORCE**
35+
- Why does REINFORCE have high variance despite being unbiased?
36+
- How does the reward-to-go trick exploit causality to reduce variance?
37+
- What are baseline methods for variance reduction?
38+
- How do we choose an optimal baseline to minimize variance?
39+
- What are actor-critic methods, and how do they combine value estimation with policy updates?
40+
41+
- **6. Value-Based Methods**
42+
- Value function and Q-function
43+
- What are SARSA and Q-learning?
44+
- How does Deep Q-Learning extend traditional Q-learning?
45+
46+
47+
## 📄 Assignment
48+
49+
- 🧠 **Policy Gradient & Actor-Critic Walkthrough:**
50+
Open the following Colab notebook to implement and experiment with Policy Gradient methods from scratch:
51+
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TWPHz3udlKqsdSyMvTiZG9Y5P7VrY3gH?usp=sharing)
52+
53+
This walkthrough is designed to help you implement a working **Policy Gradient agent** using PyTorch on environments like *CartPole*.
54+
55+
---
56+
57+
**📚 What You'll Learn**
58+
- Core ideas behind Policy Gradient algorithms
59+
- How to implement and train a neural network policy
60+
- How to collect rollouts and compute returns
61+
- Policy updates using gradient ascent
62+
- (Optional) Baseline methods & Generalized Advantage Estimation (GAE)
63+
64+
**🛠 Prerequisites**
65+
- Python + PyTorch basics
66+
- Key RL concepts: Policy, Reward, Return, Advantage, Value Function
67+
68+
**🗂 Notebook Structure**
69+
- **Environment Setup**: Logging and configuration
70+
- **Policy Network**: Implementation and sampling
71+
- **Training Loop**: Computing returns and updating the policy
72+
- **Variance Reduction (Optional)**: Baselines, GAE for stability
73+
74+
**👨‍🏫 Tips for Students**
75+
- Run cells in order — don’t skip!
76+
- Print out observations, actions, rewards to debug.
77+
- Try different hyperparameters and Gym environments.
78+
- Use TensorBoard or video logs to visualize progress.
79+
80+
> 📘 Inspired by [CS285: Deep RL (Berkeley)](https://rail.eecs.berkeley.edu/deeprlcourse/)
81+
82+
_Courtesy: Tejas_
83+
84+
📢 Do post doubts on the `#module-7-robot-learning` Slack channel!
85+
86+
## 🔗 Resources
87+
88+
| 📚 Topic | 🔗 Link |
89+
|----------|---------|
90+
|Lecture Slides -- Reinforcement Learning| See Lectures 4-7 from RAIL Course (linked below) |
91+
| 🎓 Deep Reinforcement Learning – Sergey Levine (RAIL, Berkeley) | [![Website](https://img.shields.io/badge/Open-Course-blue?logo=googleclassroom)](https://rail.eecs.berkeley.edu/deeprlcourse/) |
92+
| 🧠 Policy Gradient Algorithms – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2018-04-08-policy-gradient/) |
93+
| ⚙️ PPO Implementation Details – ICLR Blog Track | [![Blog](https://img.shields.io/badge/Read-PPO_Insights-orange?logo=readthedocs)](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) |
94+
| 📘 Mathematical Foundations of RL – Shiyu Zhao (Westlake University) | [![GitHub](https://img.shields.io/badge/View-on_GitHub-181717?logo=github)](https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning) |
95+
| ⚡ RL Quickstart Guide – Joseph Suarez (Pufferlib Creator) | [![X/Twitter](https://img.shields.io/badge/View-Quickstart_Guide-1DA1F2?logo=x)](https://x.com/jsuarez5341/status/1854855861295849793) |
96+
| 📦 Stable Baselines3 – RL Library (DLR-RM) | [![GitHub](https://img.shields.io/badge/View-Stable--Baselines3-181717?logo=github)](https://github.com/DLR-RM/stable-baselines3) |
97+
| 🧼 CleanRL – Minimal RL Implementations | [![GitHub](https://img.shields.io/badge/View-CleanRL-181717?logo=github)](https://github.com/vwxyzjn/cleanrl) |
98+
| 🐉 Decisions & Dragons – FAQs About RL | [![Website](https://img.shields.io/badge/Explore-Decisions_&_Dragons-blueviolet?logo=readthedocs)](https://www.decisionsanddragons.com/) |
99+
100+
---
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Lecture 23: Diffusion Models Basics
2+
**Instructor:** Anant Garg
3+
**Date:** June 14, 2025
4+
5+
## Topics Covered:
6+
7+
- Noise, Gaussians + Setup
8+
- Autoencoders, VAE, Reparameterization Trick
9+
- The Forward Process: Adding Noise Step-by-Step
10+
- The Reverse Process: Learning to Denoise
11+
- DDPM: Predicting Noise to Reconstruct Data
12+
- Guidance: Making Diffusion Outputs Useful
13+
- Classifier-Based
14+
- Classifier-Free
15+
- Score Matching
16+
- Latent Diffusion
17+
18+
## 📄 Assignment
19+
20+
- 🎨 **Diffusion Models – DDPM & Stable Diffusion Walkthroughs:**
21+
Clone and run through the following two PyTorch implementations to understand the fundamentals of diffusion models:
22+
23+
- 📦 **DDPM (Denoising Diffusion Probabilistic Models):**
24+
[explainingai-code/DDPM-PyTorch](https://github.com/explainingai-code/DDPM-Pytorch)
25+
26+
This repo walks through the original DDPM algorithm in PyTorch. Run the code, visualize the forward and reverse diffusion process, and study how noise schedules influence generation.
27+
28+
🔁 **Extension Task – Implement DDIM:**
29+
Read [DDIM paper (arXiv:2010.02502)](https://arxiv.org/abs/2010.02502) and extend the code to include deterministic sampling via DDIM.
30+
Suggested steps:
31+
- Modify the sampling loop to use DDIM's non-Markovian formulation
32+
- Add support for fewer inference steps (fast sampling)
33+
- Compare image quality vs sampling speed with DDPM
34+
35+
- 🎨 **Stable Diffusion (from scratch):**
36+
[explainingai-code/StableDiffusion-PyTorch](https://github.com/explainingai-code/StableDiffusion-PyTorch)
37+
38+
This repo walks through a simplified but faithful re-implementation of Stable Diffusion.
39+
Explore how text prompts are encoded, how the UNet denoiser operates, and how the latent diffusion process differs from vanilla DDPM.
40+
41+
💡 Feel free to experiment with prompts, noise schedules, and decoder resolutions! Post all your findings, doubts on the `#module-7-robot-learning` Slack channel.
42+
43+
## 🔗 Resources
44+
45+
| 📚 Topic | 🔗 Link |
46+
|----------|--------|
47+
| 📑 Lecture Slides – Diffusion Basics | [![PDF](https://img.shields.io/badge/Open-Slides-red?logo=adobeacrobatreader&logoColor=white)](./lec-23-diffusion-basics.pdf) |
48+
| 🧠 From Autoencoder to Beta-VAE – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2018-08-12-vae/) |
49+
| 🌫️ What Are Diffusion Models? – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/) |
50+
| 📄 DDPM – Denoising Diffusion Probabilistic Models (Ho et al.) | [![PDF](https://img.shields.io/badge/Open-Paper-blue?logo=readthedocs)](https://hojonathanho.github.io/diffusion/) |
51+
| 🧬 Latent Diffusion Models – High-Res Image Synthesis | [![arXiv](https://img.shields.io/badge/arXiv-2112.10752-b31b1b?logo=arxiv)](https://arxiv.org/pdf/2112.10752) |
52+
| 🎥 Explaining Diffusion – YouTube Playlist | [![YouTube](https://img.shields.io/badge/Watch-Playlist-red?logo=youtube&logoColor=white)](https://www.youtube.com/playlist?list=PL8VDJoEXIjpo2S7X-1YKZnbHyLGyESDCe) |
53+
| 🧪 Stable Diffusion (from scratch) – PyTorch Codebase | [![GitHub](https://img.shields.io/badge/View-Code-181717?logo=github)](https://github.com/explainingai-code/StableDiffusion-PyTorch) |
54+
| 🌊 Introduction to Flow Matching & Diffusion Models – MIT 6.S184 (Generative AI with SDEs) | [![Website](https://img.shields.io/badge/Open-Course-blue?logo=mit&logoColor=white)](https://diffusion.csail.mit.edu/) |
55+
| 🎥 Diffusion Models – Paper Explanation & Math | [![YouTube](https://img.shields.io/badge/Watch-Video-red?logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=HoKDTa5jHvg) |
56+
| 🎓 CS 198-126: Lecture 12 – Diffusion Models (ML@Berkeley) | [![YouTube](https://img.shields.io/badge/Watch-Lecture-red?logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=687zEGODmHA&t=23s) |
57+
58+
59+
---
3.9 MB
Binary file not shown.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Lecture 24: Diffusion Models for Robotics
2+
**Instructor:** Jayaram Reddy
3+
**Date:** June 14, 2025
4+
5+
## Topics Covered:
6+
- Why Diffusion for Control
7+
- Diffusion Policies
8+
- Diffusion for Motion Planning, EDMP
9+
- Diffusion for World-Modeling
10+
- Tradeoffs Autoregressive vs Diffusion Models
11+
- Latent Diffusion
12+
13+
## 📄 Assignments
14+
15+
- 🤖 **Diffusion Policy for Robot Manipulation:**
16+
Explore how diffusion models can be applied to learn robotic manipulation behaviors, such as pushing, directly from demonstrations.
17+
18+
- 📓 **Official Colab Notebook:**
19+
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing)
20+
21+
- 🤗 **Hugging Face Playground – Push Task Demo:**
22+
[![Open on Hugging Face](https://img.shields.io/badge/Launch-HF_Notebook-blueviolet?logo=huggingface&logoColor=white)](https://huggingface.co/lerobot/diffusion_pusht)
23+
24+
**What to Do:**
25+
- Run the official notebook to understand the structure of the Diffusion Policy model and how it leverages conditional generation for trajectory prediction.
26+
- Try out the push environment on Hugging Face to see the learned policy in action.
27+
- Reflect on how diffusion-based imitation compares with classical behavioral cloning.
28+
- (Optional) Try swapping out the dataset or varying inference steps to observe differences in performance.
29+
30+
> 🧠 This exercise builds intuition for how generative models can drive robotic agents with flexibility and generalization.
31+
32+
📢 Please feel free to post all questions over at the `#module-7-robot-learning` Slack channel.
33+
34+
## 🔗 Resources
35+
36+
| 📚 Topic | 🔗 Link |
37+
|----------|--------|
38+
| 📑 Lecture Slides – Diffusion for Robotics | [![PDF](https://img.shields.io/badge/Open-PDF-red?logo=adobeacrobatreader&logoColor=white)](./lec-24-diffusion-robotics.pdf) [![Slides](https://img.shields.io/badge/Open-Google_Slides-yellow?logo=googleslides&logoColor=white)](https://docs.google.com/presentation/d/1YjRIxj32OXhiaPgXihWKW40aPGhIcgBFr4CPkJLY19Q/edit?usp=sharing) |
39+
| 🤖 Diffusion Policy – Columbia University | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://diffusion-policy.cs.columbia.edu/) |
40+
| 🧠 Imitating Human Behavior with Diffusion Models (2023) | [![arXiv](https://img.shields.io/badge/arXiv-2301.10677-b31b1b?logo=arxiv)](https://arxiv.org/abs/2301.10677) |
41+
| 📐 Geometry of Diffusion Models for Robotics – Sander Dieleman | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://sander.ai/2023/08/28/geometry.html) |
42+
| 🧩 Ensemble of Costs for Diffusion Planning | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://ensemble-of-costs-diffusion.github.io/) |
43+
| 💎 DIAMOND – Diffusion Models for Diverse Robot Behavior | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://diamond-wm.github.io/) |
44+
| 🚗 Imagine2Drive – Open Vocabulary Driving Skills | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://anantagrg.github.io/Imagine-2-Drive.github.io/) |
45+
| 🧞 GENIE (Diffusion + LLMs) – Google DeepMind | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://sites.google.com/view/genie-2024/home) |
46+
| 🌌 DreamGen – Scene-Level Robot Imagination (NVIDIA) | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://research.nvidia.com/labs/gear/dreamgen/) |
47+
| 🌀 Diffusion Forcing – Diffusion Models for Human Motion Imitation | [![Website](https://img.shields.io/badge/Open-Project-blue?logo=googlechrome)](https://boyuan.space/diffusion-forcing/) |
48+
| 🔄 Flow Matching – Machine Learning Group, University of Cambridge | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html) |
49+
50+
51+
52+
---
2.82 MB
Binary file not shown.

0 commit comments

Comments
 (0)