Dpo by ahmeda14960 · Pull Request #2462 · marin-community/marin

ahmeda14960 · 2026-01-24T22:43:33Z

Include a summary of the changes and the related issue if any.

A good description is a paragraph or so describing the changes you made and the
motivation. You may follow this with a few bullets for specific changes, but
try to keep it concise.

e.g.

Title: [RL] Fix loss: use global token normalization instead of per-example

"""
This fixes a regression in the DAPO loss computation by switching
from per-example normalization (/ n_i) back to global token
normalization (/ N). Per-example normalization gives shorter responses
disproportionately more gradient weight, which hurts math reasoning
tasks where correct answers often require detailed, longer derivations.
Global normalization weights all examples equally regardless of response
length.
"""

Fixes #

together with #2463 should avoid a lot of the noisy changes in #2460/#2462

should obviate some of the noise in the #2462 PR

github-actions · 2026-02-17T01:19:33Z

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

ahmeda14960 added 10 commits January 22, 2026 13:24

dpo init

3775267

wip dpo working somewhat

5218a2c

config

fd91d3c

Merge remote-tracking branch 'origin/main' into dpo

7a9f9fa

update dpo

289de21

update dpo

56e1cc8

Merge remote-tracking branch 'origin/main' into dpo

532b563

final touches on dpo

df2b8d3

update dpo

2546b45

Merge remote-tracking branch 'origin/main' into dpo

e2e3bec

This was referenced Jan 25, 2026

fix potential sharding inside hax.vmap #2463

Merged

fix getting pspec for None-backed named arrays #2502

Merged

dlwh added a commit that referenced this pull request Jan 28, 2026

fix getting pspec for None-backed named arrays (#2502)

29ce8e7

together with #2463 should avoid a lot of the noisy changes in #2460/#2462

dlwh added a commit that referenced this pull request Jan 28, 2026

fix potential sharding inside hax.vmap (#2463)

7c5cc8c

should obviate some of the noise in the #2462 PR

github-actions bot added the stale label Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Dpo#2462

Dpo#2462
ahmeda14960 wants to merge 10 commits intomainfrom
dpo

ahmeda14960 commented Jan 24, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

ahmeda14960 commented Jan 24, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant