v2.7.0: n-step returns for all off-policy algorithms via the `n_steps` argument

Latest

Latest

araffin released this 25 Jul 09:55

· 8 commits to master since this release

bf51a62

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

New Features:

Added support for n-step returns for off-policy algorithms via the n_steps parameter

from stable_baselines3 import SAC

# SAC with n-step returns
model = SAC("MlpPolicy", "Pendulum-v1", n_steps=3, verbose=1)
model.learn(10_000)

Added NStepReplayBuffer that allows to compute n-step returns without additional memory requirement (and without for loops)
Added Gymnasium v1.2 support

Bug Fixes:

Fixed docker GPU image (PyTorch GPU was not installed)
Fixed segmentation faults caused by non-portable schedules during model loading (@akanto)

SB3-Contrib

Added support for n-step returns for off-policy algorithms via the n_steps parameter
Use the FloatSchedule and LinearSchedule classes instead of lambdas in the ARS, PPO, and QRDQN implementations to improve model portability across different operating systems

RL Zoo

linear_schedule now returns a SimpleLinearSchedule object for better portability
Renamed LunarLander-v2 to LunarLander-v3 in hyperparameters
Renamed CarRacing-v2 to CarRacing-v3 in hyperparameters
Docker GPU images are now working again
Use ConstantSchedule, and SimpleLinearSchedule instead of constant_fn and linear_schedule
Fixed CarRacing-v3 hyperparameters for newer Gymnasium version

SBX (SB3 + Jax)

Added support for n-step returns for off-policy algorithms via the n_steps parameter
Added KL Adaptive LR for PPO and LR schedule for SAC/TQC

Deprecations:

get_schedule_fn(), get_linear_fn(), constant_fn() are deprecated, please use FloatSchedule(), LinearSchedule(), ConstantSchedule() instead

Documentation:

Clarify evaluate_policy documentation
Added doc about training exceeding the total_timesteps parameter
Updated LunarLander and LunarLanderContinuous environment versions to v3 (@j0m0k0)
Added sb3-extra-buffers to the project page (@Trenza1ore)

New Contributors

@akanto made their first contribution in #2125
@omahs made their first contribution in #2140
@j0m0k0 made their first contribution in #2143
@leopardracer made their first contribution in #2147
@Trenza1ore made their first contribution in #2157

Full Changelog: v2.6.0...v2.7.0

Contributors

akanto, Trenza1ore, and 3 other contributors

Assets 2