-
Notifications
You must be signed in to change notification settings - Fork 140
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Hi,
While running the baseline experiments, I noticed a substantial discrepancy between the reported performance metrics. For example, when running QMIX with five random seeds on the 2s3z task, I obtained approximately a 40% returned_won_episode rate, whereas the test_returned_won_episode reached around 80%.
I could not find clear documentation explaining the distinction between these two metrics. From what I can tell, the original paper appears to report returned_won_episode, although additional baselines have been introduced since then.
Given this, if one had to choose between these two metrics for reporting and comparison, which would be the appropriate one to use?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested