We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 032242b commit df1623aCopy full SHA for df1623a
README.md
@@ -17,7 +17,7 @@ The environment:
17
18
Goal is to learn how to take actions in order to maximize the reward. The objective function is as following:
19
20
-<b>Q[s, a] = Q[s, a] + λ * (r + γ * max (Q[s_, a_]) – Q[s, a]),</b>
+<b>Q_[s_, a_] = Q[s, a] + λ * (r + γ * max (Q[s_, a_]) – Q[s, a]),</b>
21
22
where,
23
<br/><b>s</b> – current position of the agent,
0 commit comments