🐛 Describe the bug
I successfully run coding-grpo on 1.7B and get step-2500 checkpoint.. but this checkpoint is still pytorch checkpoint __0_0.distcp
, even if I set last_save_in_hf: true
.. looking closer, I can not find any code that uses this argument last_save_in_hf
. Given that we have a way to update to vllm weights (which should be HF format).. i think we are just missing a conversion step in the end.
ls forge/1_7B_checkpoint/step-2500/
__0_0.distcp
Versions
I forked the main on my branch.