@@ -41,11 +41,11 @@ The tutorial algorithms follow the same basic structure, as shown in file: [`./t
4141| Algorithms | Observation Space | Action Space | Tutorial Env |
4242| --------------- | ----------------- | ------------ | -------------- |
4343| Q-learning | Discrete | Discrete | FrozenLake |
44- | C51 | Discrete | Discrete | Pong, CartPole |
44+ | C51 | Continuous | Discrete | Pong, CartPole |
4545| DQN | Discrete | Discrete | FrozenLake |
46- | Variants of DQN | Discrete | Discrete | Pong, CartPole |
47- | Retrace | Discrete | Discrete | Pong, CartPole |
48- | PER | Discrete | Discrete | Pong, CartPole |
46+ | Variants of DQN | Continuous | Discrete | Pong, CartPole |
47+ | Retrace | Continuous | Discrete | Pong, CartPole |
48+ | PER | Continuous | Discrete | Pong, CartPole |
4949| Actor-Critic | Continuous | Discrete | CartPole |
5050| A3C | Continuous | Continuous | BipedalWalker |
5151| DDPG | Continuous | Continuous | Pendulum |
@@ -106,18 +106,22 @@ The tutorial algorithms follow the same basic structure, as shown in file: [`./t
106106
107107 <u >Paper</u >: [ Deep Reinforcement Learning with Double Q-learning] ( https://arxiv.org/abs/1509.06461 )
108108
109+ [ Dueling Network Architectures for Deep Reinforcement Learning] ( https://arxiv.org/abs/1511.06581 )
110+
111+ [ Noisy Networks for Exploration] ( https://arxiv.org/abs/1706.10295 )
112+
109113 <u >Description</u >:
110114
111115 ```
112116 We implement Double DQN, Dueling DQN and Noisy DQN here.
113117
114118 -The max operator in standard DQN uses the same values both to select and to evaluate an action by:
115119
116- Q(s_t, a_t) = R\_ {t+1\ } + gamma \* max\_ {a}Q\_\ {target\ }(s_{t+1}, a).
120+ Q(s_t, a_t) = R_ {t+1} + gamma * max_ {a}Q_ {target}(s_{t+1}, a).
117121
118122 -Double DQN proposes to use following evaluation to address overestimation problem of max operator:
119123
120- Q(s_t, a_t) = R\_ {t+1\ } + gamma \* Q\_ {target}(s\_\ {t+1\ }, max {a}Q(s_{t+1}, a)).
124+ Q(s_t, a_t) = R_ {t+1} + gamma * Q_ {target}(s_ {t+1}, max_ {a}Q(s_{t+1}, a)).
121125
122126 -Dueling DQN uses dueling architecture where the value of state and the advantage of each action is estimated separately.
123127
0 commit comments