Skip to content

Commit bbf8e8d

Browse files
authored
Merge pull request #10 from Ynjxsjmh/patch-1
修正公式 3.18
2 parents d6635b4 + 8e01938 commit bbf8e8d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

source/partI/chapter3/finite_markov_decision_process.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -624,7 +624,7 @@ MDP框架是从相互作用的目标导向学习的问题中抽象出来的。
624624
625625
\begin{align*}
626626
q_*(s,a) &= \mathbb{E}\left[R_{t+1}+\gamma\sum_{a^\prime}q_*(S_{t+1,a^\prime})|S_t=s,A_t=a\right] \\
627-
&=\sum_{s^\prime,r}p(s^\prime,r|s,a)[r+\gamma \sum_{a^\prime}q_*(s^\prime,a^\prime)]
627+
&=\sum_{s^\prime,r}p(s^\prime,r|s,a)[r+\gamma \max_{a^\prime}q_*(s^\prime,a^\prime)]
628628
\end{align*}
629629
630630
下图中的备份图以图像方式显示了在 :math:`v_*` 和 :math:`q_*` 的贝尔曼最优方程中考虑的未来状态和动作的跨度。

0 commit comments

Comments
 (0)