Replies: 1 comment
-
I found custom settings by the authors are published as modified version of open r1. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
According to the recently published CPPO literature and GitHub
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
https://arxiv.org/abs/2503.22342
https://github.com/lzhxmu/CPPO
It seems that this can be achieved by setting three parameters for GRPOConfig
https://github.com/lzhxmu/CPPO/blob/main/scripts/CPPO.sh
metric= 'smallest'
pruning= 0.5
allocation= True
Currently, none of these are included in the configuration items.
In v0.16.0, it became possible to set scale_rewards in response to the DRGRPO literature, and I was impressed by how quickly it could be set up.
Are there any plans to set parameters for CPPO as well?
Beta Was this translation helpful? Give feedback.
All reactions