Chapter 19, page 681


<img width="651" height="319" alt="Image" src="https://github.com/user-attachments/assets/6de0975f-fff6-4b9e-92d9-eeae08b02162" />



# Bug Report: Incorrect return calculations for Episodes 1 and 2

Hi! First of all, thank you for this wonderful book — it's been an incredibly helpful resource for learning RL concepts. 🙏

I believe I've found calculation errors in the **Episodic versus continuing tasks** section, specifically in the return computations for **Episode 1** and **Episode 2**. Both share the same root cause (an off-by-one error in the number of transitions), and Episode 2 also contains repeated subscript typos.

## Location

Chapter 19 (Reinforcement Learning), section *"RL terminology: return, policy, and value function"*, subsection *"The return"* — computation of the return for Episodes 1 and 2.

---

## Episode 1: BBCCCCBAT → pass (final reward = +1)

### The issue

The episode has 8 non-terminal states:

$$S_0=B,\; S_1=B,\; S_2=C,\; S_3=C,\; S_4=C,\; S_5=C,\; S_6=B,\; S_7=A,\; S_8=\text{pass (terminal)}$$

So $T = 8$, with **8 transitions** and 8 rewards ($R_1$ through $R_8$), where $R_1 = \cdots = R_7 = 0$ and $R_8 = +1$.

However, the book computes $G_0$ summing only up to $\gamma^6 R_7$ (7 rewards instead of 8):

$$G_0 = R_1 + \gamma R_2 + \gamma^2 R_3 + \cdots + \gamma^6 R_7$$

It should be:

$$G_0 = R_1 + \gamma R_2 + \gamma^2 R_3 + \cdots + \gamma^7 R_8 = 0.9^7 \approx 0.478$$

### What the book says vs. corrected values

| Time step | Book writes | Book value | Should be | Corrected value |
|-----------|------------|-----------|-----------|----------------|
| $t = 0$   | $G_0 = 0 + \cdots + 0.9^6$ | $0.531$ | $G_0 = 0.9^7$ | $0.478$ |
| $t = 1$   | $G_1 = 1 \times \gamma^5$ | $0.590$ | $G_1 = 0.9^6$ | $0.531$ |
| $t = 2$   | $G_2 = 1 \times \gamma^4$ | $0.656$ | $G_2 = 0.9^5$ | $0.590$ |
| $t = 3$   | (not shown) | — | $G_3 = 0.9^4$ | $0.656$ |
| $t = 4$   | (not shown) | — | $G_4 = 0.9^3$ | $0.729$ |
| $t = 5$   | (not shown) | — | $G_5 = 0.9^2$ | $0.810$ |
| $t = 6$   | $G_6 = 1 \times \gamma$ | $0.900$ | $G_6 = 0.9^1$ | $0.900$ ✓ |
| $t = 7$   | $G_7 = 1$ | $1.000$ | $G_7 = 1$ | $1.000$ ✓ |

### Corrected calculation with backward recursion ($G_t = R_{t+1} + \gamma \, G_{t+1}$)

| Time step | Reward $R_{t+1}$ | Calculation               | Corrected value |
|-----------|-------------------|---------------------------|----------------|
| $G_7$     | $R_8 = +1$        | $1$                       | $1.000$ |
| $G_6$     | $R_7 = 0$         | $0 + 0.9 \times 1.000$   | $0.900$ |
| $G_5$     | $R_6 = 0$         | $0 + 0.9 \times 0.900$   | $0.810$ |
| $G_4$     | $R_5 = 0$         | $0 + 0.9 \times 0.810$   | $0.729$ |
| $G_3$     | $R_4 = 0$         | $0 + 0.9 \times 0.729$   | $0.656$ |
| $G_2$     | $R_3 = 0$         | $0 + 0.9 \times 0.656$   | $0.590$ |
| $G_1$     | $R_2 = 0$         | $0 + 0.9 \times 0.590$   | $0.531$ |
| $G_0$     | $R_1 = 0$         | $0 + 0.9 \times 0.531$   | $0.478$ |

---

## Episode 2: ABBBBBBBBBT → fail (final reward = −1)

### The issue

The episode has 10 non-terminal states:

$$S_0=A,\; S_1=B,\; S_2=B,\; S_3=B,\; S_4=B,\; S_5=B,\; S_6=B,\; S_7=B,\; S_8=B,\; S_9=B,\; S_{10}=\text{fail (terminal)}$$

So $T = 10$, with **10 transitions** and 10 rewards ($R_1$ through $R_{10}$), where $R_1 = \cdots = R_9 = 0$ and $R_{10} = -1$.

This episode has **two types of errors**:

#### Error 1: Off-by-one (same as Episode 1)

The book computes $G_0 = -1 \times \gamma^8 = -0.430$, but with 10 transitions it should be $G_0 = -1 \times \gamma^9 = -0.387$.

#### Error 2: Subscript typo — the book writes $G_0$ on almost every line

The book uses $G_0$ as the subscript for every time step instead of $G_t$, which is very confusing. Here is exactly what the book prints:

| Line in book | Book writes (verbatim) | Book value |
|--------------|----------------------|-----------|
| $t = 0$      | $G_0 = -1 \times \gamma^8$  | $-0.430$ |
| $t = 1$      | $G_0 = -1 \times \gamma^7$  | $-0.478$ |
| $t = 2$      | (not shown, implied by "...") | — |
| $t = 3$      | (not shown, implied by "...") | — |
| $t = 4$      | (not shown, implied by "...") | — |
| $t = 5$      | (not shown, implied by "...") | — |
| $t = 6$      | (not shown, implied by "...") | — |
| $t = 7$      | (not shown, implied by "...") | — |
| $t = 8$      | $G_0 = -1 \times \gamma$    | $-0.900$ |
| $t = 9$      | $G_{10} = -1$                | $-1.000$ |

As you can see, the subscript is wrong on every shown line:
- At $t = 0$: writes $G_0$ → this one is actually correct
- At $t = 1$: writes $G_0$ → **should be $G_1$**
- At $t = 8$: writes $G_0$ → **should be $G_8$**
- At $t = 9$: writes $G_{10}$ → **should be $G_9$**

### What the book should say (all steps, corrected)

| Time step | Correct subscript | Calculation | Corrected value |
|-----------|-------------------|-------------|----------------|
| $t = 0$   | $G_0 = -1 \times \gamma^9$ | $-0.9^9$ | $-0.387$ |
| $t = 1$   | $G_1 = -1 \times \gamma^8$ | $-0.9^8$ | $-0.430$ |
| $t = 2$   | $G_2 = -1 \times \gamma^7$ | $-0.9^7$ | $-0.478$ |
| $t = 3$   | $G_3 = -1 \times \gamma^6$ | $-0.9^6$ | $-0.531$ |
| $t = 4$   | $G_4 = -1 \times \gamma^5$ | $-0.9^5$ | $-0.590$ |
| $t = 5$   | $G_5 = -1 \times \gamma^4$ | $-0.9^4$ | $-0.656$ |
| $t = 6$   | $G_6 = -1 \times \gamma^3$ | $-0.9^3$ | $-0.729$ |
| $t = 7$   | $G_7 = -1 \times \gamma^2$ | $-0.9^2$ | $-0.810$ |
| $t = 8$   | $G_8 = -1 \times \gamma^1$ | $-0.9^1$ | $-0.900$ |
| $t = 9$   | $G_9 = -1$                  | $-1$      | $-1.000$ |

### Corrected calculation with backward recursion ($G_t = R_{t+1} + \gamma \, G_{t+1}$)

| Time step | Reward $R_{t+1}$ | Calculation                  | Corrected value |
|-----------|-------------------|------------------------------|----------------|
| $G_9$     | $R_{10} = -1$     | $-1$                         | $-1.000$ |
| $G_8$     | $R_9 = 0$         | $0 + 0.9 \times (-1.000)$   | $-0.900$ |
| $G_7$     | $R_8 = 0$         | $0 + 0.9 \times (-0.900)$   | $-0.810$ |
| $G_6$     | $R_7 = 0$         | $0 + 0.9 \times (-0.810)$   | $-0.729$ |
| $G_5$     | $R_6 = 0$         | $0 + 0.9 \times (-0.729)$   | $-0.656$ |
| $G_4$     | $R_5 = 0$         | $0 + 0.9 \times (-0.656)$   | $-0.590$ |
| $G_3$     | $R_4 = 0$         | $0 + 0.9 \times (-0.590)$   | $-0.531$ |
| $G_2$     | $R_3 = 0$         | $0 + 0.9 \times (-0.531)$   | $-0.478$ |
| $G_1$     | $R_2 = 0$         | $0 + 0.9 \times (-0.478)$   | $-0.430$ |
| $G_0$     | $R_1 = 0$         | $0 + 0.9 \times (-0.430)$   | $-0.387$ |

---

## Summary

Both episodes share an **off-by-one error**: the calculations count one fewer transition than the episode actually has, shifting all return values by one power of $\gamma$. The values near the terminal state are correct, but the earlier time steps are all off.

Episode 2 additionally has a **subscript typo**: the book writes $G_0$ at every time step ($t = 0, 1, 8$) instead of the correct $G_0, G_1, G_8$, and writes $G_{10}$ instead of $G_9$ at the final step.

Thank you again for your work on this book, and I hope this note is helpful!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 19, page 681 #240

Bug Report: Incorrect return calculations for Episodes 1 and 2

Location

Episode 1: BBCCCCBAT → pass (final reward = +1)

The issue

What the book says vs. corrected values

Corrected calculation with backward recursion ($G_t = R_{t+1} + \gamma , G_{t+1}$)

Episode 2: ABBBBBBBBBT → fail (final reward = −1)

The issue

Error 1: Off-by-one (same as Episode 1)

Error 2: Subscript typo — the book writes $G_0$ on almost every line

What the book should say (all steps, corrected)

Corrected calculation with backward recursion ($G_t = R_{t+1} + \gamma , G_{t+1}$)

Summary

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Time step	Book writes	Book value	Should be	Corrected value
$t = 0$	$G_0 = 0 + \cdots + 0.9^6$	$0.531$	$G_0 = 0.9^7$	$0.478$
$t = 1$	$G_1 = 1 \times \gamma^5$	$0.590$	$G_1 = 0.9^6$	$0.531$
$t = 2$	$G_2 = 1 \times \gamma^4$	$0.656$	$G_2 = 0.9^5$	$0.590$
$t = 3$	(not shown)	—	$G_3 = 0.9^4$	$0.656$
$t = 4$	(not shown)	—	$G_4 = 0.9^3$	$0.729$
$t = 5$	(not shown)	—	$G_5 = 0.9^2$	$0.810$
$t = 6$	$G_6 = 1 \times \gamma$	$0.900$	$G_6 = 0.9^1$	$0.900$ ✓
$t = 7$	$G_7 = 1$	$1.000$	$G_7 = 1$	$1.000$ ✓

Time step	Reward $R_{t+1}$	Calculation	Corrected value
$G_7$	$R_8 = +1$	$1$	$1.000$
$G_6$	$R_7 = 0$	$0 + 0.9 \times 1.000$	$0.900$
$G_5$	$R_6 = 0$	$0 + 0.9 \times 0.900$	$0.810$
$G_4$	$R_5 = 0$	$0 + 0.9 \times 0.810$	$0.729$
$G_3$	$R_4 = 0$	$0 + 0.9 \times 0.729$	$0.656$
$G_2$	$R_3 = 0$	$0 + 0.9 \times 0.656$	$0.590$
$G_1$	$R_2 = 0$	$0 + 0.9 \times 0.590$	$0.531$
$G_0$	$R_1 = 0$	$0 + 0.9 \times 0.531$	$0.478$

Line in book	Book writes (verbatim)	Book value
$t = 0$	$G_0 = -1 \times \gamma^8$	$-0.430$
$t = 1$	$G_0 = -1 \times \gamma^7$	$-0.478$
$t = 2$	(not shown, implied by "...")	—
$t = 3$	(not shown, implied by "...")	—
$t = 4$	(not shown, implied by "...")	—
$t = 5$	(not shown, implied by "...")	—
$t = 6$	(not shown, implied by "...")	—
$t = 7$	(not shown, implied by "...")	—
$t = 8$	$G_0 = -1 \times \gamma$	$-0.900$
$t = 9$	$G_{10} = -1$	$-1.000$

Time step	Correct subscript	Calculation	Corrected value
$t = 0$	$G_0 = -1 \times \gamma^9$	$-0.9^9$	$-0.387$
$t = 1$	$G_1 = -1 \times \gamma^8$	$-0.9^8$	$-0.430$
$t = 2$	$G_2 = -1 \times \gamma^7$	$-0.9^7$	$-0.478$
$t = 3$	$G_3 = -1 \times \gamma^6$	$-0.9^6$	$-0.531$
$t = 4$	$G_4 = -1 \times \gamma^5$	$-0.9^5$	$-0.590$
$t = 5$	$G_5 = -1 \times \gamma^4$	$-0.9^4$	$-0.656$
$t = 6$	$G_6 = -1 \times \gamma^3$	$-0.9^3$	$-0.729$
$t = 7$	$G_7 = -1 \times \gamma^2$	$-0.9^2$	$-0.810$
$t = 8$	$G_8 = -1 \times \gamma^1$	$-0.9^1$	$-0.900$
$t = 9$	$G_9 = -1$	$-1$	$-1.000$

Time step	Reward $R_{t+1}$	Calculation	Corrected value
$G_9$	$R_{10} = -1$	$-1$	$-1.000$
$G_8$	$R_9 = 0$	$0 + 0.9 \times (-1.000)$	$-0.900$
$G_7$	$R_8 = 0$	$0 + 0.9 \times (-0.900)$	$-0.810$
$G_6$	$R_7 = 0$	$0 + 0.9 \times (-0.810)$	$-0.729$
$G_5$	$R_6 = 0$	$0 + 0.9 \times (-0.729)$	$-0.656$
$G_4$	$R_5 = 0$	$0 + 0.9 \times (-0.656)$	$-0.590$
$G_3$	$R_4 = 0$	$0 + 0.9 \times (-0.590)$	$-0.531$
$G_2$	$R_3 = 0$	$0 + 0.9 \times (-0.531)$	$-0.478$
$G_1$	$R_2 = 0$	$0 + 0.9 \times (-0.478)$	$-0.430$
$G_0$	$R_1 = 0$	$0 + 0.9 \times (-0.430)$	$-0.387$

Chapter 19, page 681 #240

Description

Bug Report: Incorrect return calculations for Episodes 1 and 2

Location

Episode 1: BBCCCCBAT → pass (final reward = +1)

The issue

What the book says vs. corrected values

Corrected calculation with backward recursion ($G_t = R_{t+1} + \gamma , G_{t+1}$)

Episode 2: ABBBBBBBBBT → fail (final reward = −1)

The issue

Error 1: Off-by-one (same as Episode 1)

Error 2: Subscript typo — the book writes $G_0$ on almost every line

What the book should say (all steps, corrected)

Corrected calculation with backward recursion ($G_t = R_{t+1} + \gamma , G_{t+1}$)

Summary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions