Skip to content

Comments

The plan#2518

Draft
dlwh wants to merge 12 commits intomainfrom
the_plan
Draft

The plan#2518
dlwh wants to merge 12 commits intomainfrom
the_plan

Conversation

@dlwh
Copy link
Member

@dlwh dlwh commented Jan 28, 2026

Include a summary of the changes and the related issue if any.

A good description is a paragraph or so describing the changes you made and the
motivation. You may follow this with a few bullets for specific changes, but
try to keep it concise.

e.g.

Title: [RL] Fix loss: use global token normalization instead of per-example

"""
This fixes a regression in the DAPO loss computation by switching
from per-example normalization (/ n_i) back to global token
normalization (/ N). Per-example normalization gives shorter responses
disproportionately more gradient weight, which hurts math reasoning
tasks where correct answers often require detailed, longer derivations.
Global normalization weights all examples equally regardless of response
length.
"""

Fixes #

@github-actions
Copy link
Contributor

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

@github-actions github-actions bot added the stale label Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant