refactor: refactor loss function by yuki-97 · Pull Request #1920 · NVIDIA-NeMo/RL

yuki-97 · 2026-02-10T09:28:38Z

Move parallel stuffs out of loss function.
Add LossInputType (logit, logprob, distillation) and prepare_loss_input to convert logits to the destination loss input and measure the parallel stuffs.
Update the loss file structure.

├── loss
│   ├── __init__.py
│   ├── interfaces.py
│   ├── loss_functions.py
│   ├── utils.py
│   └── wrapper.py

Test Result
https://wandb.ai/nvidia/refactor-loss-yukih?nw=0k8r2x613fml

GRPO

SFT	Distillation

DPO	RM

Nightly test all passed except the tests that already failed at main. #2041

Summary by CodeRabbit

Documentation
- Updated module import paths in guides and documentation to reflect reorganized loss function architecture.
Refactor
- Restructured loss function modules into a new hierarchical package with improved interfaces and abstractions. Introduced LossInputType enumeration for standardized loss function input specifications. Updated loss function signatures to use pre-computed log probabilities instead of raw logits. Reorganized loss utilities including input preparation and sequence packing wrappers.

github-actions · 2026-02-10T09:29:13Z

⚠️ File Consistency Check

Check based on commit: 696f9ad (PR #1920 from yukih/refactor-loss)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

terrykong

@hemildesai can you review the automodel changes
@yaoyu-33 @cuichenx to comment on the change from the megatron side (although this PR hasn't implemented that part yet)
@zpqiu to comment on the distillation changes

nemo_rl/distributed/model_utils.py

nemo_rl/models/automodel/train.py

nemo_rl/algorithms/loss_functions.py

github-actions · 2026-02-26T10:29:01Z

ℹ️ File Consistency Check

Check based on commit: 90693e1 (PR #1920 from yukih/refactor-loss)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-02-26T15:59:09Z

ℹ️ File Consistency Check

Check based on commit: bdc4277 (PR #1920 from yukih/refactor-loss)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-02-27T04:32:21Z

ℹ️ File Consistency Check

Check based on commit: a81e0cc (PR #1920 from yukih/refactor-loss)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-02-27T09:41:05Z

ℹ️ File Consistency Check

Check based on commit: d641ee9 (PR #1920 from yukih/refactor-loss)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-02-27T14:35:21Z

ℹ️ File Consistency Check

Check based on commit: 1b752f6 (PR #1920 from yukih/refactor-loss)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

terrykong

generally lgtm. i think this is a great change. makes our loss abstractions easier to write and grok. thanks @yuki-97 !

Since this change touches every algorithm, can you run the nightlies?

nemo_rl/algorithms/loss/wrapper.py

nemo_rl/algorithms/loss/interfaces.py

github-actions · 2026-03-02T02:49:12Z

ℹ️ File Consistency Check

Check based on commit: e363ebc (PR #1920 from yukih/refactor-loss)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Yuki Huang <yukih@nvidia.com>

github-actions · 2026-03-02T02:49:56Z

ℹ️ File Consistency Check

Check based on commit: 443d7ad (PR #1920 from yukih/refactor-loss)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

yuki-97 · 2026-03-02T03:46:24Z

Since this change touches every algorithm, can you run the nightlies?

yea, I've run the nightly tests, all tests passed except the tests that already failed at main, so this PR should be fine.

I filed an issue with error logs: #2041. maybe assign someone to fix? @terrykong

yuki-97 force-pushed the yukih/refactor-loss branch from 696f9ad to 54e1283 Compare February 10, 2026 12:39

terrykong reviewed Feb 12, 2026

View reviewed changes

terrykong requested review from cuichenx, hemildesai, yaoyu-33 and zpqiu February 12, 2026 08:07

yuki-97 force-pushed the yukih/refactor-loss branch from 54e1283 to f203c94 Compare February 26, 2026 05:35

github-actions bot added the documentation Improvements or additions to documentation label Feb 26, 2026

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Feb 26, 2026

yuki-97 temporarily deployed to nemo-ci February 26, 2026 16:05 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 26, 2026 18:22 — with GitHub Actions Inactive

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 27, 2026

yuki-97 temporarily deployed to nemo-ci February 27, 2026 04:33 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 27, 2026 07:03 — with GitHub Actions Inactive

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 27, 2026

yuki-97 temporarily deployed to nemo-ci February 27, 2026 09:41 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 27, 2026 09:45 — with GitHub Actions Inactive

yuki-97 mentioned this pull request Feb 27, 2026

Cleanup parallel model utils #2033

Open

yuki-97 marked this pull request as ready for review February 27, 2026 15:22

yuki-97 requested review from a team as code owners February 27, 2026 15:22

yuki-97 temporarily deployed to nemo-ci February 28, 2026 05:17 — with GitHub Actions Inactive

terrykong reviewed Feb 28, 2026

View reviewed changes

nemo_rl/algorithms/loss/wrapper.py Show resolved Hide resolved

nemo_rl/algorithms/loss/interfaces.py Outdated Show resolved Hide resolved

yuki-97 added 15 commits March 2, 2026 10:49

update ClippedPGLossFn, NLLLoss, DPOLossFn

df8c208

Signed-off-by: Yuki Huang <yukih@nvidia.com>

fix seq packing

30b91f6

Signed-off-by: Yuki Huang <yukih@nvidia.com>

update unit test for sft/rl/dpo and add value check for distillation

c7de9e1

Signed-off-by: Yuki Huang <yukih@nvidia.com>

update PreferenceLoss and DistillationLossFn

cfe8e50

Signed-off-by: Yuki Huang <yukih@nvidia.com>

add LossInputType

621bc09

Signed-off-by: Yuki Huang <yukih@nvidia.com>

args -> kwargs

cde9c3e

Signed-off-by: Yuki Huang <yukih@nvidia.com>

typo

2fb40ed

Signed-off-by: Yuki Huang <yukih@nvidia.com>

refactor file path

11f573b

Signed-off-by: Yuki Huang <yukih@nvidia.com>

fix test_loss_functions

68af7fe

Signed-off-by: Yuki Huang <yukih@nvidia.com>

update megatron

c450c51

Signed-off-by: Yuki Huang <yukih@nvidia.com>

update dtensor v1

0e48e74

Signed-off-by: Yuki Huang <yukih@nvidia.com>

fix test

000231e

Signed-off-by: Yuki Huang <yukih@nvidia.com>

fix PreferenceLossFn and unit test

aa9b1f4

Signed-off-by: Yuki Huang <yukih@nvidia.com>

fix unit test

0eddfa0

Signed-off-by: Yuki Huang <yukih@nvidia.com>

address comments

443d7ad

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/refactor-loss branch from e363ebc to 443d7ad Compare March 2, 2026 02:49

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Mar 2, 2026

yuki-97 temporarily deployed to nemo-ci March 2, 2026 02:50 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci March 2, 2026 05:10 — with GitHub Actions Inactive

terrykong approved these changes Mar 2, 2026

View reviewed changes

yuki-97 temporarily deployed to nemo-ci March 2, 2026 07:27 — with GitHub Actions Inactive

yuki-97 enabled auto-merge (squash) March 2, 2026 07:43

yuki-97 merged commit dc9dce4 into main Mar 2, 2026
59 of 64 checks passed

yuki-97 deleted the yukih/refactor-loss branch March 2, 2026 10:51

Conversation

yuki-97 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Feb 10, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

terrykong left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 26, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Feb 26, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Feb 27, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Feb 27, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Feb 27, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

github-actions bot commented Mar 2, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

yuki-97 commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuki-97 commented Feb 10, 2026 •

edited

Loading

terrykong left a comment •

edited

Loading