-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hi Onno,
I'm very interested in the idea that you proposed and recently started to work on the code.
I have a question regarding the implementation of your SB3 for SAC with colored noise.
Since in SB3, the update of policy is done every N steps. This means that most likely, your policy will be updated multiple times within one episode. Each update will call the sampling function from your colored gaussian distribution and thus, draw samples from your colored process generator's buffer. This would affect the correlation also during the interaction with the environments as some "samples" in between are taken for updating policy.
Would this be a problem then? Or you think it would not have huge impact anyway.
Best,
Baohe
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels