Skip to content

Commit 5e62856

Browse files
Update deep-rl-dqn.md (#464)
1 parent e9df715 commit 5e62856

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

deep-rl-dqn.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ To help us stabilize the training, we implement three different solutions:
159159
2. *Fixed Q-Target* **to stabilize the training**.
160160
3. *Double Deep Q-Learning*, to **handle the problem of the overestimation of Q-values**.
161161

162-
We'll see these three solutions in the pseudocode.
162+
<!--- We'll see these three solutions in the pseudocode. --->
163163

164164
### Experience Replay to make more efficient use of experiences
165165

@@ -230,7 +230,7 @@ We know that the accuracy of Q values depends on what action we tried **and** 
230230
Consequently, we don’t have enough information about the best action to take at the beginning of the training. Therefore, taking the maximum Q value (which is noisy) as the best action to take can lead to false positives. If non-optimal actions are regularly **given a higher Q value than the optimal best action, the learning will be complicated.**
231231

232232
The solution is: when we compute the Q target, we use two networks to decouple the action selection from the target Q value generation. We:
233-
<img src="assets/78_deep_rl_dqn/double-dqn-pseudocode.jpg" alt="Double DQN Pseudocode"/>
233+
<!---<img src="assets/78_deep_rl_dqn/double-dqn-pseudocode.jpg" alt="Double DQN Pseudocode"/>--->
234234
- Use our **DQN network** to select the best action to take for the next state (the action with the highest Q value).
235235
- Use our **Target network** to calculate the target Q value of taking that action at the next state.
236236

0 commit comments

Comments
 (0)