Skip to content

Comments

Dpo#2462

Draft
ahmeda14960 wants to merge 10 commits intomainfrom
dpo
Draft

Dpo#2462
ahmeda14960 wants to merge 10 commits intomainfrom
dpo

Conversation

@ahmeda14960
Copy link
Contributor

Include a summary of the changes and the related issue if any.

A good description is a paragraph or so describing the changes you made and the
motivation. You may follow this with a few bullets for specific changes, but
try to keep it concise.

e.g.

Title: [RL] Fix loss: use global token normalization instead of per-example

"""
This fixes a regression in the DAPO loss computation by switching
from per-example normalization (/ n_i) back to global token
normalization (/ N). Per-example normalization gives shorter responses
disproportionately more gradient weight, which hurts math reasoning
tasks where correct answers often require detailed, longer derivations.
Global normalization weights all examples equally regardless of response
length.
"""

Fixes #

dlwh added a commit that referenced this pull request Jan 28, 2026
together with #2463 should avoid a lot of the noisy changes in
#2460/#2462
dlwh added a commit that referenced this pull request Jan 28, 2026
should obviate some of the noise in the #2462 PR
@github-actions
Copy link
Contributor

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

@github-actions github-actions bot added the stale label Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant