Skip to content

[WiP] Reproducible on- and off-policy sampling#2185

Open
MkuuWaUjinga wants to merge 7 commits intorlworkgroup:masterfrom
MkuuWaUjinga:fix/deterministic_sampling
Open

[WiP] Reproducible on- and off-policy sampling#2185
MkuuWaUjinga wants to merge 7 commits intorlworkgroup:masterfrom
MkuuWaUjinga:fix/deterministic_sampling

Conversation

@MkuuWaUjinga
Copy link
Copy Markdown

@MkuuWaUjinga MkuuWaUjinga commented Nov 26, 2020

Extend the Environment API to support setting environment library specific seeds.

Tasks:

  • Extend Environment interface
  • Set seeds for Gym envs
  • Ensure seeds are set when Sampler classes start working
  • Set seeds for dm_control envs
  • Set seeds for grid_world envs
  • Set seeds for point envs
  • Set seeds for metaworld envs

Open Questions:

  • How to ensure determinism of off-policy algorithms?

@MkuuWaUjinga MkuuWaUjinga requested a review from a team as a code owner November 26, 2020 21:16
@MkuuWaUjinga MkuuWaUjinga requested review from ahtsan and removed request for a team November 26, 2020 21:16
@mergify mergify bot requested review from a team, gitanshu and ziyiwu9494 and removed request for a team November 26, 2020 21:17
@MkuuWaUjinga
Copy link
Copy Markdown
Author

Thanks for the pointers. Addressed everything in the latest commits. I assume GridWorld and PointEnv don't have any seeds at all?
Furthermore, with the implementation right now every worker has the same environment seed. This means that each worker always samples the same trajectory given a fixed action sequence. I think this is something we need to fix before merging?

@MkuuWaUjinga MkuuWaUjinga requested review from ryanjulian and removed request for a team December 9, 2020 14:52
@mergify mergify bot requested a review from a team December 9, 2020 14:53
@MkuuWaUjinga MkuuWaUjinga changed the title [WiP] Deterministic on- and off-policy sampling [WiP] Reproducible on- and off-policy sampling Dec 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants