[RFC][Feat Proposal][New Contributor] Integrating F-GRPO

> Disclaimer: I am not the author of the paper, but I believe this technique offers a high-value, low-cost improvement for the library.

# What is F-GRPO and why using this

Paper [F-GRPO](https://arxiv.org/pdf/2602.06717) proposes to down-weight the gradient signal from prompts where the model is already highly successful in RL, preventing the optimizer from "over-fitting" to easy solutions at the expense of exploration. This is inspired by focal loss in image classification.

The motivation is that for practical group size (16, for instance):

- The gradient signal can be dominated by "common correct traj's", so down-weighting the graident signal may help reduce the "sharpening" effect in output distribution of rl (maybe good for exploration). 

- this gradient domination may hurt gradient signal from correct trajectories for hard prompts (not common in low group size), so want to "protect" that gradient signal which may be hurt by the dominating gradient signal from easy tasks.

The key logic behind this is gradient competing, that is, accepting gradient signal from some samples will reduce the prob of other samples.

# Experiment results in the paper
- Mainly on small models. Better Pass@K with large K (indicating less sharpening effect). 
<img width="1534" height="993" alt="Image" src="https://github.com/user-attachments/assets/363d1d0d-8603-4cdd-a8f6-aed6b137f0c3" />

# Need discussions

- I want to run more experiments to verify whether this is a good technique to use:

  - On larger models like Qwen30B-A3B
  - On what training recipe? Using more simple but verified training recipe like [JustRL](https://arxiv.org/pdf/2512.16649)?
  - On more tasks/data? If so, what tasks/data?
 
However, ***I am a new contributor to this repo. So I would like to know if I want to verify whether this feature should be introduced in Slime, what experiments should be ran***? Any other comments?

# My Contributions
- Run more experiments.
- Implement the feature if the maintainers agree on intergrating F-GRPO 

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][Feat Proposal][New Contributor] Integrating F-GRPO #1565

What is F-GRPO and why using this

Experiment results in the paper

Need discussions

My Contributions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC][Feat Proposal][New Contributor] Integrating F-GRPO #1565

Description

What is F-GRPO and why using this

Experiment results in the paper

Need discussions

My Contributions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions