Update README.md

sichkar-valentyn · web-flow · commit df1623ada8e7 · 2018-06-25T18:49:57.000+03:00
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@ The environment:
 
 Goal is to learn how to take actions in order to maximize the reward. The objective function is as following:
 
-<b>Q[s, a] = Q[s, a] + λ * (r + γ * max (Q[s_, a_]) – Q[s, a]),</b>
+<b>Q_[s_, a_] = Q[s, a] + λ * (r + γ * max (Q[s_, a_]) – Q[s, a]),</b>
 
 where,
 <br/><b>s</b> – current position of the agent,