Skip to content

[LLM] Rewrite GSM8K reward function to follow standard GRPO conventions #6397

[LLM] Rewrite GSM8K reward function to follow standard GRPO conventions

[LLM] Rewrite GSM8K reward function to follow standard GRPO conventions #6397

Job Run time
5s
9m 31s
8m 31s
18m 58s
10m 5s
20m 45s
10m 20s
5m 24s
24s
22s
30s
23s
18s
21s
31s
1h 26m 28s