Skip to content

Commit 6a2b88f

Browse files
authored
Nit: fix logarithm operation
1 parent 234a683 commit 6a2b88f

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

site/en/tutorials/reinforcement_learning/actor_critic.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@
444444
"\n",
445445
"The actor loss is based on [policy gradients with the critic as a state dependent baseline](https://www.youtube.com/watch?v=EKqxumCuAAY&t=62m23s) and computed with single-sample (per-episode) estimates.\n",
446446
"\n",
447-
"$$L_{actor} = -\\sum^{T}_{t=1} log\\pi_{\\theta}(a_{t} | s_{t})[G(s_{t}, a_{t}) - V^{\\pi}_{\\theta}(s_{t})]$$\n",
447+
"$$L_{actor} = -\\sum^{T}_{t=1} \\log\\pi_{\\theta}(a_{t} | s_{t})[G(s_{t}, a_{t}) - V^{\\pi}_{\\theta}(s_{t})]$$\n",
448448
"\n",
449449
"where:\n",
450450
"- $T$: the number of timesteps per episode, which can vary per episode\n",

0 commit comments

Comments
 (0)