Skip to content

Commit 73ad815

Browse files
Update README.md
1 parent 421f95b commit 73ad815

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -173,12 +173,12 @@ As we all known, there are various tricks in empirical RL algorithm implementati
173173
* Normalization:
174174
* [Reward normalization](https://github.com/quantumiracle/Popular-RL-Algorithms/blob/7f2bb74a51cf9cbde92a6ccfa42e97dc129dd145/sac_v2.py#L262) or [advantage normalization](https://github.com/quantumiracle/Popular-RL-Algorithms/blob/881903e4aa22921f142daedfcf3dd266488405d8/ppo_gae_discrete.py#L79) in batch can have great improvements on performance (learning efficiency, stability) sometimes, although theoretically on-policy algorithms like PPO should not apply data normalization during training due to distribution shift. For an in-depth look at this problem, we should treat it differently (1) when normalizing the direct input data like observation, action, reward, etc; (2) when normalizing the estimation of the values (state value, state-action value, advantage, etc). For (1), a more reasonable way for normalization is to keep a moving average of previous mean and standard deviation, to achieve a similar effect as conducting the normaliztation on the full dataset during RL agent learning (this is not possible since in RL the data comes from interaction of agents and environments). For (2), we can simply conduct normalization on value estimations (rather than keeping the historical average) since we do not want the estimated values to have distribution shift, so we treat them like a static distribution.
175175

176-
* Although I provide the multiprocessing versions of serveral algorithms ([SAC](https://github.com/quantumiracle/Popular-RL-Algorithms/blob/master/sac_v2_multiprocess.py), [PPO](https://github.com/quantumiracle/Popular-RL-Algorithms/blob/master/ppo_continuous_multiprocess2.py), etc), for small-scale environments in Gym, this is usually not necessary or even inefficient. The vectorized environment wrapper for parallel environment sampling may be more proper solution for learning these environments, since the bottelneck in learning efficiency mainly lies in the interaction with environments rather than the model learning (back-propagation) process.
177-
178176
* Multiprocessing:
179-
Is the multiprocessing update based on `torch.multiprocessing` the right way to parallelize the code?
177+
* Is the multiprocessing update based on `torch.multiprocessing` the right/safe way to parallelize the code?
180178
It can be seen that the official instruction (example of Hogwild) of using `torch.multiprocessing` is applied without any explicit locks, which means it can be potentially unsafe when multiple processes generate gradients and update the shared model at the same time. See more discussions [here](https://discuss.pytorch.org/t/synchronization-for-sharing-updating-shared-model-state-dict-across-multi-process/50102/2) and some [tests](https://discuss.pytorch.org/t/model-update-with-share-memory-need-lock-protection/72857) and [answers](https://discuss.pytorch.org/t/grad-sharing-problem-in-a3c/10635). In general, the drawback of unsafe updates may be overwhelmed by the speed up of using multiprocessing (also RL training itself has huge variances and noise).
181179

180+
* Although I provide the multiprocessing versions of serveral algorithms ([SAC](https://github.com/quantumiracle/Popular-RL-Algorithms/blob/master/sac_v2_multiprocess.py), [PPO](https://github.com/quantumiracle/Popular-RL-Algorithms/blob/master/ppo_continuous_multiprocess2.py), etc), for small-scale environments in Gym, this is usually not necessary or even inefficient. The vectorized environment wrapper for parallel environment sampling may be more proper solution for learning these environments, since the bottelneck in learning efficiency mainly lies in the interaction with environments rather than the model learning (back-propagation) process.
181+
182182
More discussions about **implementation tricks** see this [chapter](https://link.springer.com/chapter/10.1007/978-981-15-4095-0_18) in our book.
183183

184184
## Performance:

0 commit comments

Comments
 (0)