You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once open-r1 incorporates these into configs.py and grpo.py and standardizes them, it will be possible to verify the RL effectiveness of new combinations.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
According to the recently published CPPO literature and GitHub, new unique parameter settings are used for CPPO.
https://arxiv.org/abs/2503.22342
https://github.com/lzhxmu/CPPO
Once open-r1 incorporates these into configs.py and grpo.py and standardizes them, it will be possible to verify the RL effectiveness of new combinations.
Beta Was this translation helpful? Give feedback.
All reactions