Skip to content

Commit 5eff2da

Browse files
authored
Add unit 3 detailed notes and update unit 2 detailed notes
1 parent 5ccf6f3 commit 5eff2da

File tree

2 files changed

+1900
-21
lines changed

2 files changed

+1900
-21
lines changed

units/002-Q-Learning/notes-detailed.md

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,21 @@
22

33
## Table of Contents
44

5-
1. [Introduction](#introduction)
6-
1. [What is RL? A Short Recap](#what-is-rl-a-short-recap)
7-
1. [The Two Types of Value-Based Methods](#the-two-types-of-value-based-methods)
8-
1. [The Bellman Equation](#the-bellman-equation)
9-
1. [Monte Carlo vs Temporal Difference Learning](#monte-carlo-vs-temporal-difference-learning)
10-
1. [Mid-way Recap](#mid-way-recap)
11-
1. [Introducing Q-Learning](#introducing-q-learning)
12-
1. [A Q-Learning Example](#a-q-learning-example)
13-
1. [Q-Learning Recap](#q-learning-recap)
14-
1. [Glossary](#glossary)
5+
1. [Introduction](#1-introduction)
6+
2. [What is RL? A Short Recap](#2-what-is-rl-a-short-recap)
7+
3. [The Two Types of Value-Based Methods](#3-the-two-types-of-value-based-methods)
8+
4. [The Bellman Equation](#4-the-bellman-equation)
9+
5. [Monte Carlo vs Temporal Difference Learning](#5-monte-carlo-vs-temporal-difference-learning)
10+
6. [Mid-way Recap](#6-mid-way-recap)
11+
7. [Introducing Q-Learning](#7-introducing-q-learning)
12+
8. [The Q-Learning Algorithm](#8-the-q-learning-algorithm)
13+
9. [A Q-Learning Example](#8-a-q-learning-example)
14+
10. [Q-Learning Recap](#9-q-learning-recap)
15+
11. [Glossary](#10-glossary)
1516

1617
-----
1718

18-
## Introduction
19+
## 1. Introduction
1920

2021
In Unit 2, we dive deeper into **value-based methods** in Reinforcement Learning and study our first RL algorithm: **Q-Learning**.
2122

@@ -39,7 +40,7 @@ This unit is fundamental for understanding Deep Q-Learning, which was the first
3940

4041
-----
4142

42-
## What is RL? A Short Recap
43+
## 2. What is RL? A Short Recap
4344

4445
### Core Concepts
4546

@@ -80,7 +81,7 @@ The goal is to find the **optimal policy π*** that leads to the best expected c
8081

8182
-----
8283

83-
## The Two Types of Value-Based Methods
84+
## 3. The Two Types of Value-Based Methods
8485

8586
### Overview
8687

@@ -154,7 +155,7 @@ Both value functions require calculating **expected returns**, which means:
154155

155156
-----
156157

157-
## The Bellman Equation
158+
## 4. The Bellman Equation
158159

159160
### Purpose
160161

@@ -246,7 +247,7 @@ This makes computation much more efficient!
246247

247248
-----
248249

249-
## Monte Carlo vs Temporal Difference Learning
250+
## 5. Monte Carlo vs Temporal Difference Learning
250251

251252
### Overview
252253

@@ -399,7 +400,7 @@ New V(S_0) = 0.1
399400

400401
-----
401402

402-
## Mid-way Recap
403+
## 6. Mid-way Recap
403404

404405
### Value-Based Methods Summary
405406

@@ -441,7 +442,7 @@ Both methods aim to learn value functions, but differ in:
441442

442443
-----
443444

444-
## Introducing Q-Learning
445+
## 7. Introducing Q-Learning
445446

446447
### What is Q-Learning?
447448

@@ -547,7 +548,7 @@ Optimal Q-function → Optimal Q-table → Optimal Policy
547548

548549
-----
549550

550-
## The Q-Learning Algorithm
551+
## 8. The Q-Learning Algorithm
551552

552553
### Pseudocode Overview
553554

@@ -794,7 +795,7 @@ Q(S_t, A_t) = Q(S_t, A_t) + α[R_{t+1} + γ * Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t)]
794795

795796
-----
796797

797-
## A Q-Learning Example
798+
## 9. A Q-Learning Example
798799

799800
Let’s walk through a complete example step by step.
800801

@@ -986,7 +987,7 @@ Optimal Q-table → Best action at each state → Optimal policy
986987

987988
-----
988989

989-
## Q-Learning Recap
990+
## 10. Q-Learning Recap
990991

991992
### What is Q-Learning?
992993

@@ -1053,7 +1054,7 @@ For each episode:
10531054

10541055
-----
10551056

1056-
## Glossary
1057+
## 11. Glossary
10571058

10581059
### Main Concepts
10591060

0 commit comments

Comments
 (0)