Log KL Divergence in GRPO Loss function #323

krammnic · 2025-10-07T10:39:26Z

Resolves #298

casteryh · 2025-10-07T17:41:44Z

src/forge/actors/trainer.py

    def forward_backward(
        self, inputs: dict[str, Tensor], targets: dict[str, Tensor]
-    ) -> Tensor:
+    ) -> Tensor | LossMetrics:


shouldn't this be tuple[..,..]?

krammnic · 2025-10-07T23:26:07Z

@casteryh let's merge

joecummings · 2025-10-08T14:03:59Z

src/forge/losses/grpo_loss.py

        self.beta = beta

    def forward(self, logprobs, ref_logprobs, advantages, padding_mask):
        kl = torch.exp(ref_logprobs - logprobs) - (ref_logprobs - logprobs) - 1


Can we log the KL divergence minus padding tokens? May have to move that op up in the loss function.

Yep good idea

joecummings · 2025-10-08T14:05:35Z

src/forge/losses/grpo_loss.py

 import torch
 from torch import nn

+from forge.data_models.loss_metrics import LossMetrics


I'm not sure this is a fully fleshed out data model we want to use.

For now could we just define a loose type in this file and shove the metrics in that?

I've done it with data_model, because we might want to log some other things from different losses in future (margins from DPO loss for instance).

log kl

b50cb8a

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 7, 2025

casteryh reviewed Oct 7, 2025

View reviewed changes

fix typehint

723c7cb

joecummings reviewed Oct 8, 2025

View reviewed changes

padded tokens

6b0c7b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log KL Divergence in GRPO Loss function #323

Log KL Divergence in GRPO Loss function #323

krammnic commented Oct 7, 2025

Uh oh!

casteryh Oct 7, 2025

Uh oh!

krammnic Oct 7, 2025

Uh oh!

krammnic commented Oct 7, 2025

Uh oh!

joecummings Oct 8, 2025

Uh oh!

krammnic Oct 8, 2025

Uh oh!

krammnic Oct 9, 2025

Uh oh!

joecummings Oct 8, 2025

Uh oh!

krammnic Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Log KL Divergence in GRPO Loss function #323

Are you sure you want to change the base?

Log KL Divergence in GRPO Loss function #323

Conversation

krammnic commented Oct 7, 2025

Uh oh!

casteryh Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

krammnic Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

krammnic commented Oct 7, 2025

Uh oh!

joecummings Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

krammnic Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

krammnic Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

krammnic Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants