Skip to content

Commit c5226f6

Browse files
Ervin Txiaomaogy
authored andcommitted
Clear cumulative_returns_since_policy_update (#2120)
Before the CSV file's mean rewards would lag much behind the rest of the code since this buffer was never cleared.
1 parent 6d8c494 commit c5226f6

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

ml-agents/mlagents/trainers/ppo/trainer.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -422,6 +422,7 @@ def update_policy(self):
422422
number_experiences=len(self.training_buffer.update_buffer["actions"]),
423423
mean_return=float(np.mean(self.cumulative_returns_since_policy_update)),
424424
)
425+
self.cumulative_returns_since_policy_update = []
425426
n_sequences = max(
426427
int(self.trainer_parameters["batch_size"] / self.policy.sequence_length), 1
427428
)

0 commit comments

Comments
 (0)