You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Modified actor-critic policies & MlpExtractor class
ActorCriticPolicy:
- changed type hint of net_arch param: now it's a dict
- removed check that if features extractor is not shared: no shared layers are allowed in the mlp_extractor regardless of the features extractor
ActorCriticCnnPolicy:
- changed type hint of net_arch param: now it's a dict
MultiInputActorcriticPolicy:
- changed type hint of net_arch param: now it's a dict
MlpExtractor:
- changed type hint of net_arch param: now it's a dict
- adapted networks creation
- adapted methods: forward, forward_actor & forward_critic
* Removed shared layers in mlp_extractor
* Updated docs and changelog + reformat
* Updated custom policy tests
* Removed test on deprecation warning for share layers in mlp_extractor
Now shared layers are removed
* Update version
* Update RL Zoo doc
* Fix linter warnings
* Add ruff to Makefile (experimental)
* Add backward compat code and minor updates
* Update tests
* Add backward compatibility
* Fix test
* Improve compat code
Co-authored-by: Antonin RAFFIN <[email protected]>
Copy file name to clipboardExpand all lines: docs/guide/custom_policy.rst
+15-30Lines changed: 15 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,11 +117,6 @@ that derives from ``BaseFeaturesExtractor`` and then pass it to the model when t
117
117
``policy_kwargs`` (both for on-policy and off-policy algorithms).
118
118
119
119
120
-
.. warning::
121
-
If the features extractor is **non-shared**, it is **not** possible to have shared layers in the ``mlp_extractor``.
122
-
Please note that this option is **deprecated**, therefore in a future release the layers in the ``mlp_extractor`` will have to be non-shared.
123
-
124
-
125
120
.. code-block:: python
126
121
127
122
import torch as th
@@ -242,41 +237,31 @@ On-Policy Algorithms
242
237
Custom Networks
243
238
---------------
244
239
245
-
.. warning::
246
-
Shared layers in the the ``mlp_extractor`` are **deprecated**.
247
-
In a future release all layers will have to be non-shared.
248
-
If needed, you can implement a custom policy network (see `advanced example below <#advanced-example>`_).
249
-
250
-
.. warning::
251
-
In the next Stable-Baselines3 release, the behavior of ``net_arch=[128, 128]`` will change
252
-
to match the one of off-policy algorithms: it will create **separate** networks (instead of shared currently)
253
-
for the actor and the critic, with the same architecture.
254
-
255
-
256
240
If you need a network architecture that is different for the actor and the critic when using ``PPO``, ``A2C`` or ``TRPO``,
257
241
you can pass a dictionary of the following structure: ``dict(pi=[<actor network architecture>], vf=[<critic network architecture>])``.
258
242
259
243
For example, if you want a different architecture for the actor (aka ``pi``) and the critic ( value-function aka ``vf``) networks,
260
244
then you can specify ``net_arch=dict(pi=[32, 32], vf=[64, 64])``.
261
245
262
-
.. Otherwise, to have actor and critic that share the same network architecture,
263
-
.. you only need to specify ``net_arch=[128, 128]`` (here, two hidden layers of 128 units each).
246
+
Otherwise, to have actor and critic that share the same network architecture,
247
+
you only need to specify ``net_arch=[128, 128]`` (here, two hidden layers of 128 units each, this is equivalent to ``net_arch=dict(pi=[128, 128], vf=[128, 128])``).
248
+
249
+
If shared layers are needed, you need to implement a custom policy network (see `advanced example below <#advanced-example>`_).
264
250
265
251
Examples
266
252
~~~~~~~~
267
253
268
-
.. TODO(antonin): uncomment when shared network is removed
269
-
.. Same architecture for actor and critic with two layers of size 128: ``net_arch=[128, 128]``
270
-
..
271
-
.. .. code-block:: none
272
-
..
273
-
.. obs
274
-
.. / \
275
-
.. <128> <128>
276
-
.. | |
277
-
.. <128> <128>
278
-
.. | |
279
-
.. action value
254
+
Same architecture for actor and critic with two layers of size 128: ``net_arch=[128, 128]``
255
+
256
+
.. code-block:: none
257
+
258
+
obs
259
+
/ \
260
+
<128> <128>
261
+
| |
262
+
<128> <128>
263
+
| |
264
+
action value
280
265
281
266
Different architectures for actor and critic: ``net_arch=dict(pi=[32, 32], vf=[64, 64])``
0 commit comments