[LLM] Rewrite GSM8K reward function to follow standard GRPO conventions #6397
| Job | Run time |
|---|---|
| 5s | |
| 9m 31s | |
| 8m 31s | |
| 18m 58s | |
| 10m 5s | |
| 20m 45s | |
| 10m 20s | |
| 5m 24s | |
| 24s | |
| 22s | |
| 30s | |
| 23s | |
| 18s | |
| 21s | |
| 31s | |
| 1h 26m 28s |
| Job | Run time |
|---|---|
| 5s | |
| 9m 31s | |
| 8m 31s | |
| 18m 58s | |
| 10m 5s | |
| 20m 45s | |
| 10m 20s | |
| 5m 24s | |
| 24s | |
| 22s | |
| 30s | |
| 23s | |
| 18s | |
| 21s | |
| 31s | |
| 1h 26m 28s |