Skip to content

Commit 3023482

Browse files
committed
Add Lec 22-24 - Robot Learning Module
1 parent 3398886 commit 3023482

File tree

9 files changed

+222
-8
lines changed

9 files changed

+222
-8
lines changed

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Sessions will be conducted by **graduate students and faculty** from the **RRC L
2727
[![Detailed Schedule + Topics](https://img.shields.io/badge/View%20Detailed%20Schedule%20%2B%20Topics-Google%20Sheets-34A853?logo=google-sheets&logoColor=white&style=flat-square)](https://docs.google.com/spreadsheets/d/1qjU-zWitD6S8JJlbWS90PVDoHJdfmojjqB4BuxkT4w8/edit?usp=sharing)
2828

2929

30-
| # | Date | Topic | Presenter(s) | Lecture Notes | Assignments |
30+
| # | Date | Topic | Presenter(s) | Lecture Notes | Assignments/Demos |
3131
|----|--------------|-----------------------------------|------------------------------------|---------------|-------------|
3232
| 1 | May 17, 2025 | Introduction | Prof. Madhava Krishna | -- | -- |
3333
| 2 | May 19, 2025 | Linear Algebra & Probability | Vishal |[Linear Algebra Resources](lectures/02-linear-algebra-probability/README.md) | [Linear Algebra Problem Set](lectures/02-linear-algebra-probability/lec-02-linear-algebra-problems.pdf) |
@@ -48,11 +48,11 @@ Sessions will be conducted by **graduate students and faculty** from the **RRC L
4848
| 17 | Jun 7, 2025 | Motion Planning - I | Faizal | [Motion Planning - I Resources](lectures/17-motion-planning-1/README.md) | -- |
4949
| 18 | Jun 10, 2025 (AM) | Motion Planning - II | Ansh | [Motion Planning - II Resources](lectures/18-motion-planning-2/README.md) | -- |
5050
| 19 | Jun 10, 2025 (PM) | Motion Planning - III | Meet | [Motion Planning - III Resources](lectures/19-motion-planning-3/README.md) | [🚀 Collision Cones & Velocity Obstacles Interactive Demo](https://roboticsiiith.github.io/summer-school-2025/demos/lec-19-collision-cones-vo/)|
51-
| 20 | Jun 11, 2025 | ROS - I | Tarun, Soham | [ROS Deployment - I Resources](lectures/20-ros-deployment-1/README.md) | 🎓 Capstone 1/2 <br> Robot Tele-operation <br> [![Start Project](https://img.shields.io/badge/Start-Project-blue?logo=ros&logoColor=white)](lectures/20-ros-deployment-1/README.md#-capstone-project---part-1)|
52-
| 21 | Jun 12, 2025 | ROS - II | Tarun, Soham | [ROS Deployment - II Resources](lectures/21-ros-deployment-2/README.md) | 🎓 Capstone 2/2 <br> Autonomous Navigation <br> [![Launch](https://img.shields.io/badge/Start-Project-blue?logo=ros&logoColor=white)](lectures/21-ros-deployment-2/README.md#-capstone-project---part-2) |
53-
| 22 | Jun 13, 2025 | Reinforcement Learning | Vishal | | |
54-
| 23 | Jun 14, 2025 | Diffusion Models - Basics | Anant | | 🚧 WIP |
55-
| 24 | Jun 14, 2025 | Diffusion Models for Robotics | Jayaram | | |
51+
| 20 | Jun 11, 2025 | ROS Deployment - I | Tarun, Soham | [ROS Deployment - I Resources](lectures/20-ros-deployment-1/README.md) | 🎓 Capstone 1/2 <br> Robot Tele-operation <br> [![Start Project](https://img.shields.io/badge/Start-Project-blue?logo=ros&logoColor=white)](lectures/20-ros-deployment-1/README.md#-capstone-project---part-1)|
52+
| 21 | Jun 12, 2025 | ROS Deployment - II | Tarun, Soham | [ROS Deployment - II Resources](lectures/21-ros-deployment-2/README.md) | 🎓 Capstone 2/2 <br> Autonomous Navigation <br> [![Launch](https://img.shields.io/badge/Start-Project-blue?logo=ros&logoColor=white)](lectures/21-ros-deployment-2/README.md#-capstone-project---part-2) |
53+
| 22 | Jun 13, 2025 | Reinforcement Learning | Vishal, Tejas | [Reinforcement Learning Resources](lectures/22-reinforcement-learning/README.md) | [🧠 Policy Gradient & Actor-Critic Colab Walkthrough](https://colab.research.google.com/drive/1TWPHz3udlKqsdSyMvTiZG9Y5P7VrY3gH?usp=sharing) |
54+
| 23 | Jun 14, 2025 | Diffusion Models - Basics | Anant | [Diffusion Models - Basics Resources](lectures/23-diffusion-basics/README.md) | [DDPM & Stable Diffusion Walkthroughs](lectures/23-diffusion-basics/README.md#-assignment) |
55+
| 24 | Jun 14, 2025 | Diffusion Models for Robotics | Jayaram | [Diffusion Models for Robotics Resources](lectures/24-diffusion-robotics/README.md) | [Diffusion Policy for Robot Manipulation Hands-On Colab](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing) <br> [🤗 HF Push Task Demo](https://huggingface.co/lerobot/diffusion_pusht)|
5656

5757
📌 **Note:**
5858
The schedule will be regularly updated with slides, reference materials, and coding assignments as sessions conclude. Stay tuned by clicking on **Watch** for this repository or subscribing to its RSS feed.

lectures/06-dynamics-control-2/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,7 @@ Please raise doubts or engage in discussion on the **`#module-2-dynamics-control
4747
|----------------------------------|----------------------------------------------------------------------------------------|
4848
| Lecture Slides (Sarthak) - Controls - Introduction | [lec-06-controls-introduction.pdf](./lec-06-controls-introduction.pdf) |
4949
| Lecture Slides (Astik) - Controls - PID, LQR | [lec-06-controls-pid-lqr.pdf](./lec-06-controls-pid-lqr.pdf) |
50+
| **Modern Robotics: Mechanics, Planning, and Control** – Kevin M. Lynch & Frank C. Park (Northwestern University) | [![Textbook](https://img.shields.io/badge/Open-Textbook-blue?logo=readthedocs)](https://hades.mech.northwestern.edu/index.php/Modern_Robotics)<br>[![Videos](https://img.shields.io/badge/Watch-Lecture_Videos-red?logo=youtube&logoColor=white)](https://hades.mech.northwestern.edu/index.php/Modern_Robotics_Videos) |
51+
52+
5053
---

lectures/20-ros-deployment-1/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Successfully launch the robot, and **teleoperate it using an Xbox joystick**.
5050

5151
| Topic/Tool | Link |
5252
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
53-
| 📑 Lecture Slides – ROS Deployment | [![PDF](https://img.shields.io/badge/Open-Slides-red?logo=adobeacrobatreader&logoColor=white)](./lec-20-ros-deployment-1.pdf) |
53+
| 📑 Lecture Slides – ROS Deployment Part 1 | [![PDF](https://img.shields.io/badge/Open-Slides-red?logo=adobeacrobatreader&logoColor=white)](./lec-20-ros-deployment-1.pdf) |
5454
| 📘 Articulated Robotics Tutorials | [![Visit](https://img.shields.io/badge/Open-Tutorials-brightgreen?logo=readthedocs&logoColor=white)](https://articulatedrobotics.xyz/tutorials/) |
5555
| 📚 Nav2 Documentation | [![Docs](https://img.shields.io/badge/Open-Nav2%20Docs-blueviolet?logo=ros&logoColor=white)](https://docs.nav2.org) |
5656
| 🏗️ Gazebo Getting Started | [![Docs](https://img.shields.io/badge/Open-Gazebo-orange?logo=gazebo&logoColor=white)](https://gazebosim.org/docs/latest/getstarted/) |

lectures/21-ros-deployment-2/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Successfully launch the robot, integrate the provided autonomy stack (including
6060

6161
| Topic/Tool | Link |
6262
|------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
63-
| 📑 Lecture Slides – ROS Deployment 2 | [![PDF](https://img.shields.io/badge/Open-Slides-red?logo=adobeacrobatreader&logoColor=white)](./lec-21-ros-deployment-2.pdf) |
63+
| 📑 Lecture Slides – ROS Deployment Part 2 | [![PDF](https://img.shields.io/badge/Open-Slides-red?logo=adobeacrobatreader&logoColor=white)](./lec-21-ros-deployment-2.pdf) |
6464
| 📘 Articulated Robotics Tutorials | [![Visit](https://img.shields.io/badge/Open-Tutorials-brightgreen?logo=readthedocs&logoColor=white)](https://articulatedrobotics.xyz/tutorials/) |
6565
| 📚 Nav2 Documentation | [![Docs](https://img.shields.io/badge/Open-Nav2%20Docs-blueviolet?logo=ros&logoColor=white)](https://docs.nav2.org) |
6666
| 🏗️ Gazebo Getting Started | [![Docs](https://img.shields.io/badge/Open-Gazebo-orange?logo=gazebo&logoColor=white)](https://gazebosim.org/docs/latest/getstarted/) |
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Lecture 22: Reinforcement Learning
2+
**Instructors:** Vishal, Tejas
3+
**Date:** June 13, 2025
4+
5+
## 📖 Topics Covered:
6+
7+
- **1. Why Reinforcement Learning?**
8+
- Why is it hard to generate data for robots with frequently changing morphologies?
9+
- Why are traditional approaches (e.g., explicit physics models, controllers) inefficient for skill learning?
10+
- How does RL (and supervised learning) help bridge this gap?
11+
- What is an example where RL enabled fast adaptation (e.g., quadrupeds using rapid motor adaptation)?
12+
13+
- **2. RL Notation and Terminology**
14+
- What are stochastic processes and the Markovian property?
15+
- What is a Markov Decision Process (MDP), and how is it defined?
16+
17+
- **3. Anatomy of the Reinforcement Learning Pipeline**
18+
- How do we collect samples from the environment using the current policy?
19+
- What does model fitting or sample evaluation involve?
20+
- How is the policy improved based on evaluation?
21+
- How do modern simulators and sim-to-real transfer help overcome sample collection bottlenecks?
22+
23+
- **4. Policy Gradient Methods**
24+
25+
**4.1 Goal of RL**
26+
- What is the objective function \( J(\theta) \) in RL?
27+
- How does the formulation differ in finite vs. infinite horizon settings?
28+
- Why is the goal to maximize expected return?
29+
30+
**4.2 Policy Gradient**
31+
- How do we compute the gradient of the objective function?
32+
- What is the REINFORCE trick and algorithm?
33+
34+
- **5. Reducing Variance in REINFORCE**
35+
- Why does REINFORCE have high variance despite being unbiased?
36+
- How does the reward-to-go trick exploit causality to reduce variance?
37+
- What are baseline methods for variance reduction?
38+
- How do we choose an optimal baseline to minimize variance?
39+
- What are actor-critic methods, and how do they combine value estimation with policy updates?
40+
41+
- **6. Value-Based Methods**
42+
- Value function and Q-function
43+
- What are SARSA and Q-learning?
44+
- How does Deep Q-Learning extend traditional Q-learning?
45+
46+
47+
## 📄 Assignment
48+
49+
- 🧠 **Policy Gradient & Actor-Critic Walkthrough:**
50+
Open the following Colab notebook to implement and experiment with Policy Gradient methods from scratch:
51+
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TWPHz3udlKqsdSyMvTiZG9Y5P7VrY3gH?usp=sharing)
52+
53+
This walkthrough is designed to help you implement a working **Policy Gradient agent** using PyTorch on environments like *CartPole*.
54+
55+
---
56+
57+
**📚 What You'll Learn**
58+
- Core ideas behind Policy Gradient algorithms
59+
- How to implement and train a neural network policy
60+
- How to collect rollouts and compute returns
61+
- Policy updates using gradient ascent
62+
- (Optional) Baseline methods & Generalized Advantage Estimation (GAE)
63+
64+
**🛠 Prerequisites**
65+
- Python + PyTorch basics
66+
- Key RL concepts: Policy, Reward, Return, Advantage, Value Function
67+
68+
**🗂 Notebook Structure**
69+
- **Environment Setup**: Logging and configuration
70+
- **Policy Network**: Implementation and sampling
71+
- **Training Loop**: Computing returns and updating the policy
72+
- **Variance Reduction (Optional)**: Baselines, GAE for stability
73+
74+
**👨‍🏫 Tips for Students**
75+
- Run cells in order — don’t skip!
76+
- Print out observations, actions, rewards to debug.
77+
- Try different hyperparameters and Gym environments.
78+
- Use TensorBoard or video logs to visualize progress.
79+
80+
> 📘 Inspired by [CS285: Deep RL (Berkeley)](https://rail.eecs.berkeley.edu/deeprlcourse/)
81+
82+
_Courtesy: Tejas_
83+
84+
📢 Do post doubts on the `#module-7-robot-learning` Slack channel!
85+
86+
## 🔗 Resources
87+
88+
| 📚 Topic | 🔗 Link |
89+
|----------|---------|
90+
|Lecture Slides -- Reinforcement Learning| See Lectures 4-7 from RAIL Course (linked below) |
91+
| 🎓 Deep Reinforcement Learning – Sergey Levine (RAIL, Berkeley) | [![Website](https://img.shields.io/badge/Open-Course-blue?logo=googleclassroom)](https://rail.eecs.berkeley.edu/deeprlcourse/) |
92+
| 🧠 Policy Gradient Algorithms – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2018-04-08-policy-gradient/) |
93+
| ⚙️ PPO Implementation Details – ICLR Blog Track | [![Blog](https://img.shields.io/badge/Read-PPO_Insights-orange?logo=readthedocs)](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) |
94+
| 📘 Mathematical Foundations of RL – Shiyu Zhao (Westlake University) | [![GitHub](https://img.shields.io/badge/View-on_GitHub-181717?logo=github)](https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning) |
95+
| ⚡ RL Quickstart Guide – Joseph Suarez (Pufferlib Creator) | [![X/Twitter](https://img.shields.io/badge/View-Quickstart_Guide-1DA1F2?logo=x)](https://x.com/jsuarez5341/status/1854855861295849793) |
96+
| 📦 Stable Baselines3 – RL Library (DLR-RM) | [![GitHub](https://img.shields.io/badge/View-Stable--Baselines3-181717?logo=github)](https://github.com/DLR-RM/stable-baselines3) |
97+
| 🧼 CleanRL – Minimal RL Implementations | [![GitHub](https://img.shields.io/badge/View-CleanRL-181717?logo=github)](https://github.com/vwxyzjn/cleanrl) |
98+
| 🐉 Decisions & Dragons – FAQs About RL | [![Website](https://img.shields.io/badge/Explore-Decisions_&_Dragons-blueviolet?logo=readthedocs)](https://www.decisionsanddragons.com/) |
99+
100+
---
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Lecture 23: Diffusion Models Basics
2+
**Instructor:** Anant Garg
3+
**Date:** June 14, 2025
4+
5+
## Topics Covered:
6+
7+
- Noise, Gaussians + Setup
8+
- Autoencoders, VAE, Reparameterization Trick
9+
- The Forward Process: Adding Noise Step-by-Step
10+
- The Reverse Process: Learning to Denoise
11+
- DDPM: Predicting Noise to Reconstruct Data
12+
- Guidance: Making Diffusion Outputs Useful
13+
- Classifier-Based
14+
- Classifier-Free
15+
- Score Matching
16+
- Latent Diffusion
17+
18+
## 📄 Assignment
19+
20+
- 🎨 **Diffusion Models – DDPM & Stable Diffusion Walkthroughs:**
21+
Clone and run through the following two PyTorch implementations to understand the fundamentals of diffusion models:
22+
23+
- 📦 **DDPM (Denoising Diffusion Probabilistic Models):**
24+
[explainingai-code/DDPM-PyTorch](https://github.com/explainingai-code/DDPM-Pytorch)
25+
26+
This repo walks through the original DDPM algorithm in PyTorch. Run the code, visualize the forward and reverse diffusion process, and study how noise schedules influence generation.
27+
28+
🔁 **Extension Task – Implement DDIM:**
29+
Read [DDIM paper (arXiv:2010.02502)](https://arxiv.org/abs/2010.02502) and extend the code to include deterministic sampling via DDIM.
30+
Suggested steps:
31+
- Modify the sampling loop to use DDIM's non-Markovian formulation
32+
- Add support for fewer inference steps (fast sampling)
33+
- Compare image quality vs sampling speed with DDPM
34+
35+
- 🎨 **Stable Diffusion (from scratch):**
36+
[explainingai-code/StableDiffusion-PyTorch](https://github.com/explainingai-code/StableDiffusion-PyTorch)
37+
38+
This repo walks through a simplified but faithful re-implementation of Stable Diffusion.
39+
Explore how text prompts are encoded, how the UNet denoiser operates, and how the latent diffusion process differs from vanilla DDPM.
40+
41+
💡 Feel free to experiment with prompts, noise schedules, and decoder resolutions! Post all your findings, doubts on the `#module-7-robot-learning` Slack channel.
42+
43+
## 🔗 Resources
44+
45+
| 📚 Topic | 🔗 Link |
46+
|----------|--------|
47+
| 📑 Lecture Slides – Diffusion Basics | [![PDF](https://img.shields.io/badge/Open-Slides-red?logo=adobeacrobatreader&logoColor=white)](./lec-23-diffusion-basics.pdf) |
48+
| 🧠 From Autoencoder to Beta-VAE – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2018-08-12-vae/) |
49+
| 🌫️ What Are Diffusion Models? – Lilian Weng | [![Blog](https://img.shields.io/badge/Read-Blog-orange?logo=readthedocs)](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/) |
50+
| 📄 DDPM – Denoising Diffusion Probabilistic Models (Ho et al.) | [![PDF](https://img.shields.io/badge/Open-Paper-blue?logo=readthedocs)](https://hojonathanho.github.io/diffusion/) |
51+
| 🧬 Latent Diffusion Models – High-Res Image Synthesis | [![arXiv](https://img.shields.io/badge/arXiv-2112.10752-b31b1b?logo=arxiv)](https://arxiv.org/pdf/2112.10752) |
52+
| 🎥 Explaining Diffusion – YouTube Playlist | [![YouTube](https://img.shields.io/badge/Watch-Playlist-red?logo=youtube&logoColor=white)](https://www.youtube.com/playlist?list=PL8VDJoEXIjpo2S7X-1YKZnbHyLGyESDCe) |
53+
| 🧪 Stable Diffusion (from scratch) – PyTorch Codebase | [![GitHub](https://img.shields.io/badge/View-Code-181717?logo=github)](https://github.com/explainingai-code/StableDiffusion-PyTorch) |
54+
| 🌊 Introduction to Flow Matching & Diffusion Models – MIT 6.S184 (Generative AI with SDEs) | [![Website](https://img.shields.io/badge/Open-Course-blue?logo=mit&logoColor=white)](https://diffusion.csail.mit.edu/) |
55+
| 🎥 Diffusion Models – Paper Explanation & Math | [![YouTube](https://img.shields.io/badge/Watch-Video-red?logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=HoKDTa5jHvg) |
56+
| 🎓 CS 198-126: Lecture 12 – Diffusion Models (ML@Berkeley) | [![YouTube](https://img.shields.io/badge/Watch-Lecture-red?logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=687zEGODmHA&t=23s) |
57+
58+
59+
---
3.9 MB
Binary file not shown.

0 commit comments

Comments
 (0)