Skip to content

Commit c1f1c3d

Browse files
authored
Release v1.6.0 (#958)
* Release v1.6.0 + update doc + add copy button * Update read the doc conda env * Update year * Fix bug in kl divergence check * Rephrase requirement for envpool and isaac gym
1 parent ef10189 commit c1f1c3d

File tree

10 files changed

+52
-9
lines changed

10 files changed

+52
-9
lines changed

docs/conda_env.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ dependencies:
66
- cpuonly=1.0=0
77
- pip=21.1
88
- python=3.7
9-
- pytorch=1.8.1=py3.7_cpu_0
9+
- pytorch=1.11=py3.7_cpu_0
1010
- pip:
11-
- gym>=0.17.2
11+
- gym==0.21
1212
- cloudpickle
1313
- opencv-python-headless
1414
- pandas
1515
- numpy
1616
- matplotlib
1717
- sphinx_autodoc_typehints
1818
- sphinx>=4.2
19-
# See https://github.com/readthedocs/sphinx_rtd_theme/issues/1115
2019
- sphinx_rtd_theme>=1.0
20+
- sphinx_copybutton

docs/conf.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,14 @@
2424
except ImportError:
2525
enable_spell_check = False
2626

27+
# Try to enable copy button
28+
try:
29+
import sphinx_copybutton # noqa: F401
30+
31+
enable_copy_button = True
32+
except ImportError:
33+
enable_copy_button = False
34+
2735
# source code directory, relative to this file, for sphinx-autobuild
2836
sys.path.insert(0, os.path.abspath(".."))
2937

@@ -51,7 +59,7 @@ def __getattr__(cls, name):
5159
# -- Project information -----------------------------------------------------
5260

5361
project = "Stable Baselines3"
54-
copyright = "2020, Stable Baselines3"
62+
copyright = "2022, Stable Baselines3"
5563
author = "Stable Baselines3 Contributors"
5664

5765
# The short X.Y version
@@ -83,6 +91,9 @@ def __getattr__(cls, name):
8391
if enable_spell_check:
8492
extensions.append("sphinxcontrib.spelling")
8593

94+
if enable_copy_button:
95+
extensions.append("sphinx_copybutton")
96+
8697
# Add any paths that contain templates here, relative to this directory.
8798
templates_path = ["_templates"]
8899

docs/guide/examples.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -729,6 +729,16 @@ to keep track of the agent progress.
729729
model.learn(10_000)
730730
731731
732+
SB3 with EnvPool or Isaac Gym
733+
-----------------------------
734+
735+
Just like Procgen (see above), `EnvPool <https://github.com/sail-sg/envpool>`_ and `Isaac Gym <https://github.com/NVIDIA-Omniverse/IsaacGymEnvs>`_ accelerate the environment by
736+
already providing a vectorized implementation.
737+
738+
To use SB3 with those tools, you must wrap the env with tool's specific ``VecEnvWrapper`` that will pre-process the data for SB3,
739+
you can find links to those wrappers in `issue #772 <https://github.com/DLR-RM/stable-baselines3/issues/772#issuecomment-1048657002>`_.
740+
741+
732742
Record a Video
733743
--------------
734744

docs/guide/install.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,17 @@ Bleeding-edge version
5454
pip install git+https://github.com/DLR-RM/stable-baselines3
5555
5656
57+
.. note::
58+
59+
If you want to use latest gym version (0.24+), you have to use
60+
61+
.. code-block:: bash
62+
63+
pip install git+https://github.com/carlosluis/stable-baselines3/tree/fix_tests
64+
65+
See `PR #780 <https://github.com/DLR-RM/stable-baselines3/pull/780>`_ for more information.
66+
67+
5768
Development version
5869
-------------------
5970

docs/misc/changelog.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@ Changelog
44
==========
55

66

7-
Release 1.5.1a9 (WIP)
7+
Release 1.6.0 (2022-07-11)
88
---------------------------
99

10+
**Recurrent PPO (PPO LSTM), better defaults for learning from pixels with SAC/TD3**
11+
1012
Breaking Changes:
1113
^^^^^^^^^^^^^^^^^
1214
- Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
@@ -34,6 +36,7 @@ Bug Fixes:
3436
- Fixed issues due to newer version of protobuf (tensorboard) and sphinx
3537
- Fix exception causes all over the codebase (@cool-RR)
3638
- Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination due to a bug (@MWeltevrede)
39+
- Fixed a bug in ``kl_divergence`` check that would fail when using numpy arrays with MultiCategorical distribution
3740

3841
Deprecations:
3942
^^^^^^^^^^^^^
@@ -51,6 +54,8 @@ Documentation:
5154
- Added remark about breaking Markov assumption and timeout handling
5255
- Added doc about MLFlow integration via custom logger (@git-thor)
5356
- Updated Huggingface integration doc
57+
- Added copy button for code snippets
58+
- Added doc about EnvPool and Isaac Gym support
5459

5560

5661
Release 1.5.0 (2022-03-25)

setup.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,8 @@
111111
"sphinxcontrib.spelling",
112112
# Type hints support
113113
"sphinx-autodoc-typehints",
114+
# Copy button for code snippets
115+
"sphinx_copybutton",
114116
],
115117
"extra": [
116118
# For render

stable_baselines3/common/buffers.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,8 @@ def __init__(
193193
# see https://github.com/DLR-RM/stable-baselines3/issues/934
194194
if optimize_memory_usage and handle_timeout_termination:
195195
raise ValueError(
196-
"ReplayBuffer does not support optimize_memory_usage = True and handle_timeout_termination = True simultaneously."
196+
"ReplayBuffer does not support optimize_memory_usage = True "
197+
"and handle_timeout_termination = True simultaneously."
197198
)
198199
self.optimize_memory_usage = optimize_memory_usage
199200

stable_baselines3/common/distributions.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from typing import Any, Dict, List, Optional, Tuple, Union
55

66
import gym
7+
import numpy as np
78
import torch as th
89
from gym import spaces
910
from torch import nn
@@ -688,7 +689,7 @@ def kl_divergence(dist_true: Distribution, dist_pred: Distribution) -> th.Tensor
688689
# MultiCategoricalDistribution is not a PyTorch Distribution subclass
689690
# so we need to implement it ourselves!
690691
if isinstance(dist_pred, MultiCategoricalDistribution):
691-
assert dist_pred.action_dims == dist_true.action_dims, "Error: distributions must have the same input space"
692+
assert np.allclose(dist_pred.action_dims, dist_true.action_dims), "Error: distributions must have the same input space"
692693
return th.stack(
693694
[th.distributions.kl_divergence(p, q) for p, q in zip(dist_true.distribution, dist_pred.distribution)],
694695
dim=1,

stable_baselines3/version.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.5.1a9
1+
1.6.0

tests/test_distributions.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,9 @@ def test_categorical(dist, CAT_ACTIONS):
163163
BernoulliDistribution(N_ACTIONS).proba_distribution(th.rand(N_ACTIONS)),
164164
CategoricalDistribution(N_ACTIONS).proba_distribution(th.rand(N_ACTIONS)),
165165
DiagGaussianDistribution(N_ACTIONS).proba_distribution(th.rand(N_ACTIONS), th.rand(N_ACTIONS)),
166-
MultiCategoricalDistribution([N_ACTIONS, N_ACTIONS]).proba_distribution(th.rand(1, sum([N_ACTIONS, N_ACTIONS]))),
166+
MultiCategoricalDistribution(np.array([N_ACTIONS, N_ACTIONS])).proba_distribution(
167+
th.rand(1, sum([N_ACTIONS, N_ACTIONS]))
168+
),
167169
SquashedDiagGaussianDistribution(N_ACTIONS).proba_distribution(th.rand(N_ACTIONS), th.rand(N_ACTIONS)),
168170
StateDependentNoiseDistribution(N_ACTIONS).proba_distribution(
169171
th.rand(N_ACTIONS), th.rand([N_ACTIONS, N_ACTIONS]), th.rand([N_ACTIONS, N_ACTIONS])

0 commit comments

Comments
 (0)