You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m experimenting with the open-r1 repo and tried the following flow:
Perform SFT (Supervised Fine-Tuning) on a base model such as Qwen2.5
Run GRPO using the fine-tuned model to further improve performance
However, when I ran GRPO, I observed no learning effect at all.
Upon investigation, I found that all model parameters had requires_grad=False after SFT:
for name, param in model.named_parameters():
print(f"{name}: {param.requires_grad}")
I attempted to manually set requires_grad=True but it didn't solve the issue. I suspect this might be related to how the model is passed to GRPOTrainer or how it is initialized internally.
My question is:
How can I correctly configure the model so that requires_grad=True for parameters during GRPO training in open-r1?
Any advice or pointer to a working example or part of the codebase would be greatly appreciated!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I’m experimenting with the open-r1 repo and tried the following flow:
However, when I ran GRPO, I observed no learning effect at all.
Upon investigation, I found that all model parameters had requires_grad=False after SFT:
I attempted to manually set requires_grad=True but it didn't solve the issue. I suspect this might be related to how the model is passed to GRPOTrainer or how it is initialized internally.
My question is:
How can I correctly configure the model so that requires_grad=True for parameters during GRPO training in open-r1?
Any advice or pointer to a working example or part of the codebase would be greatly appreciated!
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions