[Breaking Change v10] Nstep no longer returns "discounts" #8
ymd-h
announced in
Announcement
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We released cpprb v10 just now. (Binary will be uploaded to PyPI soon.)
ReplayBuffer
withNstep
no longer returns"discounts"
because users can always multiply with fixedgamma ** nstep
.For example, if we have a terminated trajectory
, then 3-step targets become as follows;
As long as
done
are correctly calculated,sample["rew"] + (gamma ** nstep) * (1 - sample["done"]) * Q(sample["next_obs"]).max(axis=1)
is fine.If you have any questions, please feel free to ask me.
Ref: https://gitlab.com/ymd_h/cpprb/-/issues/137
Ref: https://ymd_h.gitlab.io/cpprb/features/nstep/
Beta Was this translation helpful? Give feedback.
All reactions