Skip to content

[RFC][Feat Proposal][New Contributor] Integrating F-GRPO #1565

@Felix-Zhenghao

Description

@Felix-Zhenghao

Disclaimer: I am not the author of the paper, but I believe this technique offers a high-value, low-cost improvement for the library.

What is F-GRPO and why using this

Paper F-GRPO proposes to down-weight the gradient signal from prompts where the model is already highly successful in RL, preventing the optimizer from "over-fitting" to easy solutions at the expense of exploration. This is inspired by focal loss in image classification.

The motivation is that for practical group size (16, for instance):

  • The gradient signal can be dominated by "common correct traj's", so down-weighting the graident signal may help reduce the "sharpening" effect in output distribution of rl (maybe good for exploration).

  • this gradient domination may hurt gradient signal from correct trajectories for hard prompts (not common in low group size), so want to "protect" that gradient signal which may be hurt by the dominating gradient signal from easy tasks.

The key logic behind this is gradient competing, that is, accepting gradient signal from some samples will reduce the prob of other samples.

Experiment results in the paper

  • Mainly on small models. Better Pass@K with large K (indicating less sharpening effect).
Image

Need discussions

  • I want to run more experiments to verify whether this is a good technique to use:

    • On larger models like Qwen30B-A3B
    • On what training recipe? Using more simple but verified training recipe like JustRL?
    • On more tasks/data? If so, what tasks/data?

However, I am a new contributor to this repo. So I would like to know if I want to verify whether this feature should be introduced in Slime, what experiments should be ran? Any other comments?

My Contributions

  • Run more experiments.
  • Implement the feature if the maintainers agree on intergrating F-GRPO

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions