Skip to content

Commit 993de0e

Browse files
committed
Update documentation for reward scaling wrappers
1 parent c11ac05 commit 993de0e

File tree

2 files changed

+4
-21
lines changed

2 files changed

+4
-21
lines changed

gymnasium/wrappers/stateful_reward.py

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,22 +20,14 @@
2020
class NormalizeReward(
2121
gym.Wrapper[ObsType, ActType, ObsType, ActType], gym.utils.RecordConstructorArgs
2222
):
23-
r"""This wrapper will scale rewards s.t. the discounted returns have a mean of 0 and std of 1.
24-
25-
In a nutshell, the rewards are divided through by the standard deviation of a rolling discounted sum of the reward.
26-
The exponential moving average will have variance :math:`(1 - \gamma)^2`.
23+
r"""Normalizes immediate rewards such that their exponential moving average has an approximately fixed variance.
2724
2825
The property `_update_running_mean` allows to freeze/continue the running mean calculation of the reward
2926
statistics. If `True` (default), the `RunningMeanStd` will get updated every time `self.normalize()` is called.
3027
If False, the calculated statistics are used but not updated anymore; this may be used during evaluation.
3128
3229
A vector version of the wrapper exists :class:`gymnasium.wrappers.vector.NormalizeReward`.
3330
34-
Important note:
35-
Contrary to what the name suggests, this wrapper does not normalize the rewards to have a mean of 0 and a standard
36-
deviation of 1. Instead, it scales the rewards such that **discounted returns** have approximately unit variance.
37-
See [Engstrom et al.](https://openreview.net/forum?id=r1etN1rtPB) on "reward scaling" for more information.
38-
3931
Note:
4032
In v0.27, NormalizeReward was updated as the forward discounted reward estimate was incorrectly computed in Gym v0.25+.
4133
For more detail, read [#3154](https://github.com/openai/gym/pull/3152).
@@ -74,7 +66,6 @@ class NormalizeReward(
7466
... episode_rewards.append(reward)
7567
...
7668
>>> env.close()
77-
>>> # will approach 0.99 with more episodes
7869
>>> np.var(episode_rewards)
7970
np.float64(0.010162116476634746)
8071
@@ -89,7 +80,7 @@ def __init__(
8980
gamma: float = 0.99,
9081
epsilon: float = 1e-8,
9182
):
92-
"""This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
83+
"""This wrapper will normalize immediate rewards s.t. their exponential moving average has an approximately fixed variance.
9384
9485
Args:
9586
env (env): The environment to apply the wrapper

gymnasium/wrappers/vector/stateful_reward.py

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,12 @@
1919

2020

2121
class NormalizeReward(VectorWrapper, gym.utils.RecordConstructorArgs):
22-
r"""This wrapper will scale rewards s.t. the discounted returns have a mean of 0 and std of 1.
23-
24-
In a nutshell, the rewards are divided through by the standard deviation of a rolling discounted sum of the reward.
25-
The exponential moving average will have variance :math:`(1 - \gamma)^2`.
22+
r"""This wrapper will scale rewards s.t. their exponential moving average has an approximately fixed variance.
2623
2724
The property `_update_running_mean` allows to freeze/continue the running mean calculation of the reward
2825
statistics. If `True` (default), the `RunningMeanStd` will get updated every time `self.normalize()` is called.
2926
If False, the calculated statistics are used but not updated anymore; this may be used during evaluation.
3027
31-
Important note:
32-
Contrary to what the name suggests, this wrapper does not normalize the rewards to have a mean of 0 and a standard
33-
deviation of 1. Instead, it scales the rewards such that **discounted returns** have approximately unit variance.
34-
See [Engstrom et al.](https://openreview.net/forum?id=r1etN1rtPB) on "reward scaling" for more information.
35-
3628
Note:
3729
The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly
3830
instantiated or the policy was changed recently.
@@ -79,7 +71,7 @@ def __init__(
7971
gamma: float = 0.99,
8072
epsilon: float = 1e-8,
8173
):
82-
"""This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
74+
"""This wrapper will normalize immediate rewards s.t. their exponential moving average has an approximately fixed variance.
8375
8476
Args:
8577
env (env): The environment to apply the wrapper

0 commit comments

Comments
 (0)