Skip to content

v2.7.0: n-step returns for all off-policy algorithms via the `n_steps` argument

Latest
Compare
Choose a tag to compare
@araffin araffin released this 25 Jul 09:55
· 8 commits to master since this release
bf51a62

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

New Features:

  • Added support for n-step returns for off-policy algorithms via the n_steps parameter
from stable_baselines3 import SAC

# SAC with n-step returns
model = SAC("MlpPolicy", "Pendulum-v1", n_steps=3, verbose=1)
model.learn(10_000)
  • Added NStepReplayBuffer that allows to compute n-step returns without additional memory requirement (and without for loops)
  • Added Gymnasium v1.2 support

Bug Fixes:

  • Fixed docker GPU image (PyTorch GPU was not installed)
  • Fixed segmentation faults caused by non-portable schedules during model loading (@akanto)

SB3-Contrib

  • Added support for n-step returns for off-policy algorithms via the n_steps parameter
  • Use the FloatSchedule and LinearSchedule classes instead of lambdas in the ARS, PPO, and QRDQN implementations to improve model portability across different operating systems

RL Zoo

  • linear_schedule now returns a SimpleLinearSchedule object for better portability
  • Renamed LunarLander-v2 to LunarLander-v3 in hyperparameters
  • Renamed CarRacing-v2 to CarRacing-v3 in hyperparameters
  • Docker GPU images are now working again
  • Use ConstantSchedule, and SimpleLinearSchedule instead of constant_fn and linear_schedule
  • Fixed CarRacing-v3 hyperparameters for newer Gymnasium version

SBX (SB3 + Jax)

  • Added support for n-step returns for off-policy algorithms via the n_steps parameter
  • Added KL Adaptive LR for PPO and LR schedule for SAC/TQC

Deprecations:

  • get_schedule_fn(), get_linear_fn(), constant_fn() are deprecated, please use FloatSchedule(), LinearSchedule(), ConstantSchedule() instead

Documentation:

  • Clarify evaluate_policy documentation
  • Added doc about training exceeding the total_timesteps parameter
  • Updated LunarLander and LunarLanderContinuous environment versions to v3 (@j0m0k0)
  • Added sb3-extra-buffers to the project page (@Trenza1ore)

New Contributors

Full Changelog: v2.6.0...v2.7.0