[algo] feat: add optimal token baseline and variance proxy (#4678) #36
e2e_one_step_off_policy.yml
on: push
setup
0s
e2e_one_step_off_policy_fsdp2
0s
e2e_one_step_off_policy_megatron
0s
cleanup
5s