Sumdigits exmaple with GRPO #183

Ritesh1905 · 2025-09-19T01:21:49Z

The reward-mean converges in less than 15 training steps and takes ~5 mins

wandb run: https://meta.wandb.io/torchforge/sumdigits-training/runs/uxzowpkp?nw=nwuserrithesh

Made some corrections to GRPO loss function.
Updates the sumdigits example to use GRPO loss
Introduces a small curriculum setup.

TODO:

Lot of code redundancy, which will be fixed once the abstractions are landed.

src/forge/losses/grpo_loss.py

src/forge/util/model_utils.py

src/forge/losses/grpo_loss.py

GPRO works!

a451b61

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025

fixing lint and failing tests

7cb1e76

Ritesh1905 changed the title ~~Sumdigits exmaple with GPRO~~ Sumdigits exmaple with GRPO Sep 19, 2025

Ritesh1905 marked this pull request as ready for review September 19, 2025 01:50

Ritesh1905 requested review from allenwang28 and joecummings September 19, 2025 01:54

JenniferWang mentioned this pull request Sep 19, 2025

Update logic to only track policy version in the main controller #181

Merged

Ritesh1905 commented Sep 19, 2025

View reviewed changes

src/forge/losses/grpo_loss.py Outdated Show resolved Hide resolved

joecummings reviewed Sep 19, 2025

View reviewed changes

src/forge/util/model_utils.py Show resolved Hide resolved

src/forge/util/model_utils.py Outdated Show resolved Hide resolved

src/forge/losses/grpo_loss.py Outdated Show resolved Hide resolved

rithesh added 3 commits September 19, 2025 11:30

addressing code review comments

0fc6fb2

some minor changes

62db950

lints

7a9f8ce

joecummings approved these changes Sep 19, 2025

View reviewed changes

Ritesh1905 merged commit ce155a3 into main Sep 19, 2025
5 checks passed

Ritesh1905 deleted the rithesh/grpo branch October 7, 2025 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sumdigits exmaple with GRPO #183

Sumdigits exmaple with GRPO #183

Uh oh!

Ritesh1905 commented Sep 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sumdigits exmaple with GRPO #183

Sumdigits exmaple with GRPO #183

Uh oh!

Conversation

Ritesh1905 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ritesh1905 commented Sep 19, 2025 •

edited

Loading