Skip to content

Commit 2bdc218

Browse files
Officiumzsdonghao
authored andcommitted
update readme of RL zoo and remove Retrace (#1045)
* Delete tutorial_Retrace.py * update readme of rl zoo
1 parent 55b0ece commit 2bdc218

File tree

2 files changed

+1
-298
lines changed

2 files changed

+1
-298
lines changed

examples/reinforcement_learning/README.md

Lines changed: 1 addition & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ The tutorial algorithms follow the same basic structure, as shown in file: [`./t
5656
| Prioritized Experience Replay | Discrete | Pong, CartPole | [Schaul et al. Prioritized experience replay. Schaul et al. 2015.](https://arxiv.org/abs/1511.05952) |
5757
|Dueling DQN|Discrete | Pong, CartPole |[Dueling network architectures for deep reinforcement learning. Wang et al. 2015.](https://arxiv.org/abs/1511.06581)|
5858
|Double DQN| Discrete | Pong, CartPole |[Deep reinforcement learning with double q-learning. Van et al. 2016.](https://arxiv.org/abs/1509.06461)|
59-
|Retrace|Discrete | Pong, CartPole |[Safe and efficient off-policy reinforcement learning. Munos et al. 2016: ](https://arxiv.org/pdf/1606.02647.pdf)|
6059
|Noisy DQN|Discrete | Pong, CartPole |[Noisy networks for exploration. Fortunato et al. 2017.](https://arxiv.org/pdf/1706.10295.pdf)|
6160
| Distributed DQN (C51)| Discrete | Pong, CartPole | [A distributional perspective on reinforcement learning. Bellemare et al. 2017.](https://arxiv.org/pdf/1707.06887.pdf) |
6261
|**policy-based**||||
@@ -170,23 +169,6 @@ The tutorial algorithms follow the same basic structure, as shown in file: [`./t
170169
```
171170

172171

173-
174-
175-
* **Retrace(lambda) DQN**
176-
177-
<u>Code</u>: `./tutorial_Retrace.py`
178-
179-
<u>Paper</u>: [Safe and Efficient Off-Policy Reinforcement Learning](https://arxiv.org/abs/1606.02647)
180-
181-
<u>Description:</u>
182-
183-
```
184-
Retrace (lambda) is an off-policy algorithm that extend the idea of eligibility trace. It apply an importance sampling ratio truncated at 1 to several behaviour policies, which suffer from the variance explosion of standard IS and lead to safe and efficient learning.
185-
```
186-
187-
188-
189-
190172
* **Actor-Critic (AC)**
191173

192174
<u>Code</u>:`./tutorial_AC.py`
@@ -355,5 +337,5 @@ Our env wrapper: `./tutorial_wrappers.py`
355337
- @zsdonghao Hao Dong: AC, A3C, Q-Learning, DQN, PG
356338
- @quantumiracle Zihan Ding: SAC, TD3.
357339
- @Tokarev-TT-33 Tianyang Yu @initial-h Hongming Zhang : PG, DDPG, PPO, DPPO, TRPO
358-
- @Officium Yanhua Huang: C51, Retrace, DQN_variants, prioritized_replay, wrappers.
340+
- @Officium Yanhua Huang: C51, DQN_variants, prioritized_replay, wrappers.
359341

examples/reinforcement_learning/tutorial_Retrace.py

Lines changed: 0 additions & 279 deletions
This file was deleted.

0 commit comments

Comments
 (0)