Skip to content

[LLM] Rewrite GSM8K reward function to follow standard GRPO conventions #11463

[LLM] Rewrite GSM8K reward function to follow standard GRPO conventions

[LLM] Rewrite GSM8K reward function to follow standard GRPO conventions #11463

Triggered via pull request March 5, 2026 11:01
Status Skipped
Total duration 1s
Artifacts

test-linux-habitat.yml

on: pull_request
Matrix: tests
Waiting for pending jobs
Fit to window
Zoom out
Zoom in