You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When getting batches from a well-collected RecurrentRolloutBuffer, only episode_starts[batch_inds] will be returned to the sequence data. And this "episode_starts" is important for lstm policy to reset the hidden state during the training.
However, I have a question about the behavior here. As the seq_start_indices are decided together by both episode_starts and env_change, why are only episode_starts returned?
To be more clear, why the line 240 in common.recurrent.buffers is like "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds])" instead of "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds] or env_change[batch_inds])"?
Thank you for the explanation in advance.
Checklist
I have checked that there is no similar issue in the repo