Align AIME pass@1 with literature (#666)

lewtun · web-flow · commit bb14995c4ecc · 2025-04-08T16:36:55.000+02:00
Recent papers like [SimpleRL-Zoo](https://arxiv.org/pdf/2503.18892) and [VAPO](https://arxiv.org/pdf/2504.05118) have adopted `n=32` as the default estimate for AIME24. This PR bumps our default to the same value so we align with what others report.
diff --git a/src/lighteval/tasks/default_tasks.py b/src/lighteval/tasks/default_tasks.py
@@ -325,7 +325,7 @@
     generation_size=32768,
     metric=[
         Metrics.expr_gold_metric,
-        Metrics.math_pass_at_1_16n,
+        Metrics.math_pass_at_1_32n,
     ],
     version=1,
 )
@@ -342,7 +342,7 @@
     generation_size=10000,
     metric=[
         Metrics.expr_gold_metric,
-        Metrics.math_pass_at_1_16n,
+        Metrics.math_pass_at_1_32n,
     ],
     version=1,
 )