Skip to content

What performance to report? #166

@hdonancio

Description

@hdonancio

Hi,

While running the baseline experiments, I noticed a substantial discrepancy between the reported performance metrics. For example, when running QMIX with five random seeds on the 2s3z task, I obtained approximately a 40% returned_won_episode rate, whereas the test_returned_won_episode reached around 80%.

I could not find clear documentation explaining the distinction between these two metrics. From what I can tell, the original paper appears to report returned_won_episode, although additional baselines have been introduced since then.

Given this, if one had to choose between these two metrics for reporting and comparison, which would be the appropriate one to use?

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions