22
33## Table of Contents
44
5- 1 . [ Introduction] ( #introduction )
6- 1 . [ What is RL? A Short Recap] ( #what-is-rl-a-short-recap )
7- 1 . [ The Two Types of Value-Based Methods] ( #the-two-types-of-value-based-methods )
8- 1 . [ The Bellman Equation] ( #the-bellman-equation )
9- 1 . [ Monte Carlo vs Temporal Difference Learning] ( #monte-carlo-vs-temporal-difference-learning )
10- 1 . [ Mid-way Recap] ( #mid-way-recap )
11- 1 . [ Introducing Q-Learning] ( #introducing-q-learning )
12- 1 . [ A Q-Learning Example] ( #a-q-learning-example )
13- 1 . [ Q-Learning Recap] ( #q-learning-recap )
14- 1 . [ Glossary] ( #glossary )
5+ 1 . [ Introduction] ( #1-introduction )
6+ 2 . [ What is RL? A Short Recap] ( #2-what-is-rl-a-short-recap )
7+ 3 . [ The Two Types of Value-Based Methods] ( #3-the-two-types-of-value-based-methods )
8+ 4 . [ The Bellman Equation] ( #4-the-bellman-equation )
9+ 5 . [ Monte Carlo vs Temporal Difference Learning] ( #5-monte-carlo-vs-temporal-difference-learning )
10+ 6 . [ Mid-way Recap] ( #6-mid-way-recap )
11+ 7 . [ Introducing Q-Learning] ( #7-introducing-q-learning )
12+ 8 . [ The Q-Learning Algorithm] ( #8-the-q-learning-algorithm )
13+ 9 . [ A Q-Learning Example] ( #8-a-q-learning-example )
14+ 10 . [ Q-Learning Recap] ( #9-q-learning-recap )
15+ 11 . [ Glossary] ( #10-glossary )
1516
1617-----
1718
18- ## Introduction
19+ ## 1. Introduction
1920
2021In Unit 2, we dive deeper into ** value-based methods** in Reinforcement Learning and study our first RL algorithm: ** Q-Learning** .
2122
@@ -39,7 +40,7 @@ This unit is fundamental for understanding Deep Q-Learning, which was the first
3940
4041-----
4142
42- ## What is RL? A Short Recap
43+ ## 2. What is RL? A Short Recap
4344
4445### Core Concepts
4546
@@ -80,7 +81,7 @@ The goal is to find the **optimal policy π*** that leads to the best expected c
8081
8182-----
8283
83- ## The Two Types of Value-Based Methods
84+ ## 3. The Two Types of Value-Based Methods
8485
8586### Overview
8687
@@ -154,7 +155,7 @@ Both value functions require calculating **expected returns**, which means:
154155
155156-----
156157
157- ## The Bellman Equation
158+ ## 4. The Bellman Equation
158159
159160### Purpose
160161
@@ -246,7 +247,7 @@ This makes computation much more efficient!
246247
247248-----
248249
249- ## Monte Carlo vs Temporal Difference Learning
250+ ## 5. Monte Carlo vs Temporal Difference Learning
250251
251252### Overview
252253
@@ -399,7 +400,7 @@ New V(S_0) = 0.1
399400
400401-----
401402
402- ## Mid-way Recap
403+ ## 6. Mid-way Recap
403404
404405### Value-Based Methods Summary
405406
@@ -441,7 +442,7 @@ Both methods aim to learn value functions, but differ in:
441442
442443-----
443444
444- ## Introducing Q-Learning
445+ ## 7. Introducing Q-Learning
445446
446447### What is Q-Learning?
447448
@@ -547,7 +548,7 @@ Optimal Q-function → Optimal Q-table → Optimal Policy
547548
548549-----
549550
550- ## The Q-Learning Algorithm
551+ ## 8. The Q-Learning Algorithm
551552
552553### Pseudocode Overview
553554
@@ -794,7 +795,7 @@ Q(S_t, A_t) = Q(S_t, A_t) + α[R_{t+1} + γ * Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t)]
794795
795796-----
796797
797- ## A Q-Learning Example
798+ ## 9. A Q-Learning Example
798799
799800Let’s walk through a complete example step by step.
800801
@@ -986,7 +987,7 @@ Optimal Q-table → Best action at each state → Optimal policy
986987
987988-----
988989
989- ## Q-Learning Recap
990+ ## 10. Q-Learning Recap
990991
991992### What is Q-Learning?
992993
@@ -1053,7 +1054,7 @@ For each episode:
10531054
10541055-----
10551056
1056- ## Glossary
1057+ ## 11. Glossary
10571058
10581059### Main Concepts
10591060
0 commit comments