We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 9969539 commit e8971b3Copy full SHA for e8971b3
example/tutorial_frozenlake_dqn.py
@@ -83,7 +83,7 @@ def to_one_hot(i, n_classes=None):
83
## Obtain maxQ' and set our target value for chosen action.
84
maxQ1 = np.max(Q1)
85
targetQ = allQ
86
- # targetQ[0, a[0]] = r + lambd * maxQ1
+ targetQ[0, a[0]] = r + lambd * maxQ1
87
# targetQ[0, a[0]] = targetQ[0, a[0]] + alpha * (r + lambd * maxQ1 - targetQ[0, a[0]])
88
## Train network using target and predicted Q values
89
_ = sess.run(train_op, {inputs : [to_one_hot(s, 16)], nextQ : targetQ})
0 commit comments