GRPO Loss basic unit tests #168

Ritesh1905 · 2025-09-17T01:40:39Z

Adds basic unit tests for the GRPO loss.

(forge) [[email protected] /data/users/rithesh/forge/tests (rithesh/grpo_tests)]$ pytest unit_tests/losses/test_grpo_loss.py -v
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.10.18, pytest-7.3.2, pluggy-1.6.0 -- /home/rithesh/.conda/envs/forge/bin/python3.10
cachedir: .pytest_cache
rootdir: /data/users/rithesh/forge
configfile: pyproject.toml
plugins: typeguard-4.4.4, anyio-4.10.0
collected 12 items                                                                                                                                                         

unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_forward_basic PASSED                                                                                   [  8%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_output_shape PASSED                                                                                    [ 16%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_gradient_flow PASSED                                                                                   [ 25%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_no_gradient_to_ref_logprobs PASSED                                                                     [ 33%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_padding_mask_effect PASSED                                                                             [ 41%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_beta_parameter_effect PASSED                                                                           [ 50%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_zero_advantages PASSED                                                                                 [ 58%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_identical_policies PASSED                                                                              [ 66%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_extreme_values PASSED                                                                                  [ 75%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_numerical_stability PASSED                                                                             [ 83%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_all_masked_sequence PASSED                                                                             [ 91%]
unit_tests/losses/test_grpo_loss.py::TestSimpleGRPOLoss::test_mathematical_correctness PASSED                                                                        [100%]

Ritesh1905 · 2025-09-17T01:58:10Z

src/forge/losses/grpo_loss.py

+
+    def forward(self, logprobs, ref_logprobs, advantages, padding_mask):
+        kl = torch.exp(ref_logprobs - logprobs) - (ref_logprobs - logprobs) - 1
+        per_token_policy_loss = torch.exp(logprobs - logprobs.detach()) * advantages


@joecummings

I noticed that logprobs - logprobs.detach() will always be zero, since logprobs.detach() is just logprobs with no gradient. That means torch.exp(0) is always 1, so this term simplifies to just advantages.

Is there a specific reason for writing it this way? Or is it a leftover from a more general case (like multi-step or importance sampling)? Just wanted to check in case I’m missing some context!

Yep, this is just a direct translation of the code from TRL for ease of correctness testing: https://github.com/huggingface/trl/blob/417915a3e4d3e3bc8d7b196594308b8eabf928be/trl/trainer/grpo_trainer.py#L1664

They keep this term in for importance sampling (swapping out the second term for old logprobs).

I defer to you on whether or not to keep this expression for now :)

allenwang28

awesome @Ritesh1905 , thank you!

GRPO basic unit tests

a03d7a7

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 17, 2025

Ritesh1905 changed the title ~~GRPO basic unit tests~~ GRPO Loss basic unit tests Sep 17, 2025

fix lint issue

36f8d4b

Ritesh1905 marked this pull request as ready for review September 17, 2025 01:52

Ritesh1905 commented Sep 17, 2025

View reviewed changes

Ritesh1905 requested review from allenwang28, joecummings and pbontrager September 17, 2025 03:27

allenwang28 approved these changes Sep 17, 2025

View reviewed changes

allenwang28 merged commit 636e758 into main Sep 17, 2025
5 checks passed

Ritesh1905 deleted the rithesh/grpo_tests branch October 7, 2025 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO Loss basic unit tests #168

GRPO Loss basic unit tests #168

Uh oh!

Ritesh1905 commented Sep 17, 2025 •

edited

Loading

Uh oh!

Ritesh1905 Sep 17, 2025

Uh oh!

joecummings Sep 17, 2025 •

edited

Loading

Uh oh!

allenwang28 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GRPO Loss basic unit tests #168

GRPO Loss basic unit tests #168

Uh oh!

Conversation

Ritesh1905 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ritesh1905 Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allenwang28 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ritesh1905 commented Sep 17, 2025 •

edited

Loading

joecummings Sep 17, 2025 •

edited

Loading