We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 6824b3a commit 50708b4Copy full SHA for 50708b4
README.md
@@ -21,6 +21,7 @@ Goal is to learn how to take actions in order to maximize the reward. The object
21
22
where,
23
<br/><b>Q_[s_, a_]</b> - value of the objective function on the next step,
24
+<br/><b>Q[s, a]</b> - value of the objective function on the current position,
25
<br/><b>max(Q_[s_, a_]) – Q[s, a])</b> - choosing maximum value from the possible next steps,
26
<br/><b>s</b> – current position of the agent,
27
<br/><b>a</b> – current action,
0 commit comments