Merge pull request #1939 from agentydragon:patch-1

copybara-github · copybara-github · commit 6eb1e085e3b0 · 2021-10-15T16:24:15.000-07:00
PiperOrigin-RevId: 403491989
diff --git a/site/en/tutorials/reinforcement_learning/actor_critic.ipynb b/site/en/tutorials/reinforcement_learning/actor_critic.ipynb
@@ -444,7 +444,7 @@
         "\n",
         "The actor loss is based on [policy gradients with the critic as a state dependent baseline](https://www.youtube.com/watch?v=EKqxumCuAAY&t=62m23s) and computed with single-sample (per-episode) estimates.\n",
         "\n",
-        "$$L_{actor} = -\\sum^{T}_{t=1} log\\pi_{\\theta}(a_{t} | s_{t})[G(s_{t}, a_{t})  - V^{\\pi}_{\\theta}(s_{t})]$$\n",
+        "$$L_{actor} = -\\sum^{T}_{t=1} \\log\\pi_{\\theta}(a_{t} | s_{t})[G(s_{t}, a_{t})  - V^{\\pi}_{\\theta}(s_{t})]$$\n",
         "\n",
         "where:\n",
         "- $T$: the number of timesteps per episode, which can vary per episode\n",
@@ -738,7 +738,6 @@
         "_jQ1tEQCxwRx"
       ],
       "name": "actor_critic.ipynb",
-      "provenance": [],
       "toc_visible": true
     },
     "kernelspec": {