Reproducible results, automatic `VecEnv` wrapping, env checker and more usability improvements
Breaking Changes:
- The
seedargument has been moved fromlearn()method to model constructor
in order to have reproducible results allow_early_resetsof theMonitorwrapper now default toTruemake_atari_envnow returns aDummyVecEnvby default (instead of aSubprocVecEnv)
this usually improves performance.- Fix inconsistency of sample type, so that mode/sample function returns tensor of tf.int64 in CategoricalProbabilityDistribution/MultiCategoricalProbabilityDistribution (@seheevic)
New Features:
-
Add
n_cpu_tf_sessto model constructor to choose the number of threads used by Tensorflow -
Environments are automatically wrapped in a
DummyVecEnvif needed when passing them to the model constructor -
Added
stable_baselines.common.make_vec_envhelper to simplify VecEnv creation -
Added
stable_baselines.common.evaluation.evaluate_policyhelper to simplify model evaluation -
VecNormalizechanges:- Now supports being pickled and unpickled (@AdamGleave).
- New methods
.normalize_obs(obs)andnormalize_reward(rews)apply normalization
to arbitrary observation or rewards without updating statistics (@shwang) .get_original_reward()returns the unnormalized rewards from the most recent timestep.reset()now collects observation statistics (used to only apply normalization)
-
Add parameter
exploration_initial_epsto DQN. (@jdossgollin) -
Add type checking and PEP 561 compliance.
Note: most functions are still not annotated, this will be a gradual process. -
DDPG, TD3 and SAC accept non-symmetric action spaces. (@Antymon)
-
Add
check_envutil to check if a custom environment follows the gym interface (@araffin and @justinkterry)
Bug Fixes:
- Fix seeding, so it is now possible to have deterministic results on cpu
- Fix a bug in DDPG where
predictmethod withdeterministic=Falsewould fail - Fix a bug in TRPO: mean_losses was not initialized causing the logger to crash when there was no gradients (@MarvineGothic)
- Fix a bug in
cmd_utilfrom API change in recent Gym versions - Fix a bug in DDPG, TD3 and SAC where warmup and random exploration actions would end up scaled in the replay buffer (@Antymon)
Deprecations:
nprocs(ACKTR) andnum_procs(ACER) are deprecated in favor ofn_cpu_tf_sesswhich is now common
to all algorithmsVecNormalize:load_running_averageandsave_running_averageare deprecated in favour of using pickle.
Others:
- Add upper bound for Tensorflow version (<2.0.0).
- Refactored test to remove duplicated code
- Add pull request template
- Replaced redundant code in load_results (@jbulow)
- Minor PEP8 fixes in dqn.py (@justinkterry)
- Add a message to the assert in
PPO2 - Update replay buffer doctring
- Fix
VecEnvdocstrings
Documentation:
- Add plotting to the Monitor example (@rusu24edward)
- Add Snake Game AI project (@pedrohbtp)
- Add note on the support Tensorflow versions.
- Remove unnecessary steps required for Windows installation.
- Remove
DummyVecEnvcreation when not needed - Added
make_vec_envto the examples to simplify VecEnv creation - Add QuaRL project (@srivatsankrishnan)
- Add Pwnagotchi project (@evilsocket)
- Fix multiprocessing example (@rusu24edward)
- Fix
result_plotterexample - Add JNRR19 tutorial (by @edbeeching, @hill-a and @araffin)
- Updated notebooks link
- Fix typo in algos.rst, "containes" to "contains" (@SyllogismRXS)
- Fix outdated source documentation for load_results
- Add PPO_CPP project (@Antymon)
- Add section on C++ portability of Tensorflow models (@Antymon)
- Update custom env documentation to reflect new gym API for the
close()method (@justinkterry) - Update custom env documentation to clarify what step and reset return (@justinkterry)
- Add RL tips and tricks for doing RL experiments
- Corrected lots of typos
- Add spell check to documentation if available