Update documentation for reward scaling wrappers#1285
Update documentation for reward scaling wrappers#1285pseudo-rnd-thoughts merged 1 commit intoFarama-Foundation:mainfrom
Conversation
pseudo-rnd-thoughts
left a comment
There was a problem hiding this comment.
Thanks for the PR @keraJLi, to clarify what do you mean by their exponential moving average?
To me, this isn't clear what the expected mean is or what exactly the rewards are normalised by?
|
The rewards are scaled like this:
This means
Sadly, you cannot even say the EMA has variance one. This is because rewards are divided by the running variance of the EMA at different time steps. To me, it seems like you can't really draw any general conclusions about the properties of this method without some assumptions (e.g. if your reward's autocorrelation decays exponentially and the episode length distribution is geometric, you get a nice upper bound on the reward variance). |
|
Thanks for the reply @keraJLi I think your reword of the docstring makes sense now, I'm happy to merge if you want Looking at the paper again and, in particular, the code implementation, https://openreview.net/pdf?id=r1etN1rtPB#page=11.12 Looking at the baseline repo that I suspect might be the first implementation - https://github.com/openai/baselines/blob/master/baselines/common/vec_env/vec_normalize.py |
Description
Changes the documentation of reward scaling wrappers. It mainly removes incorrect or unsubstantiated information.
Affected wrappers are wrappers/stateful_reward.py and wrappers/vector/stateful_reward.py.
Fixes #1272
Type of change
Please delete options that are not relevant.
Checklist:
pre-commitchecks withpre-commit run --all-files(seeCONTRIBUTING.mdinstructions to set it up)