Skip to content

Commit 6eb1e08

Browse files
Merge pull request #1939 from agentydragon:patch-1
PiperOrigin-RevId: 403491989
2 parents fb32a76 + 11a8488 commit 6eb1e08

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

site/en/tutorials/reinforcement_learning/actor_critic.ipynb

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@
444444
"\n",
445445
"The actor loss is based on [policy gradients with the critic as a state dependent baseline](https://www.youtube.com/watch?v=EKqxumCuAAY&t=62m23s) and computed with single-sample (per-episode) estimates.\n",
446446
"\n",
447-
"$$L_{actor} = -\\sum^{T}_{t=1} log\\pi_{\\theta}(a_{t} | s_{t})[G(s_{t}, a_{t}) - V^{\\pi}_{\\theta}(s_{t})]$$\n",
447+
"$$L_{actor} = -\\sum^{T}_{t=1} \\log\\pi_{\\theta}(a_{t} | s_{t})[G(s_{t}, a_{t}) - V^{\\pi}_{\\theta}(s_{t})]$$\n",
448448
"\n",
449449
"where:\n",
450450
"- $T$: the number of timesteps per episode, which can vary per episode\n",
@@ -738,7 +738,6 @@
738738
"_jQ1tEQCxwRx"
739739
],
740740
"name": "actor_critic.ipynb",
741-
"provenance": [],
742741
"toc_visible": true
743742
},
744743
"kernelspec": {

0 commit comments

Comments
 (0)