What performance to report?

Hi,

While running the baseline experiments, I noticed a substantial discrepancy between the reported performance metrics. For example, when running QMIX with five random seeds on the _2s3z_ task, I obtained approximately a 40% _returned_won_episode_ rate, whereas the _test_returned_won_episode_ reached around 80%.

I could not find clear documentation explaining the distinction between these two metrics. From what I can tell, the original paper appears to report _returned_won_episode_, although additional baselines have been introduced since then.

Given this, if one had to choose between these two metrics for reporting and comparison, which would be the appropriate one to use?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What performance to report? #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What performance to report? #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions