feat: ProRLv2 - add seq-mask-tis truncated importance sampling type by hijkzzz · Pull Request #1899 · NVIDIA-NeMo/RL

hijkzzz · 2026-02-09T06:04:02Z

Add a new IS filtering mechanism "seq-mask-tis" that masks entire sequences based on the geometric mean of per-token IS ratios, while keeping non-truncated token-level IS weights for gradient correction. Also adds shared is_filter_drop_frac metric for both icepop and seq-mask-tis modes, and documents the new option in prorlv2.md.

Summary by CodeRabbit

New Features
- Introduced seq-mask-tis, a sequence-level alternative to ICE-POP for importance sampling that operates at the sequence level rather than per-token.
- Added is_filter_drop_frac metric for monitoring filtering behavior.
Documentation
- Comprehensive guide covering seq-mask-tis with rationale, configuration, and comparison with ICE-POP.
- Updated terminology and expanded examples with new feature references.

Add a new IS filtering mechanism "seq-mask-tis" that masks entire sequences based on the geometric mean of per-token IS ratios, while keeping non-truncated token-level IS weights for gradient correction. Also adds shared `is_filter_drop_frac` metric for both icepop and seq-mask-tis modes, and documents the new option in prorlv2.md. Signed-off-by: jianh <jianh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-02-09T06:08:37Z

📝 Walkthrough

Walkthrough

The PR introduces seq-mask-tis, a sequence-level importance sampling approach as an alternative to ICE-POP. Documentation is updated with terminology clarification and detailed comparison between methods. Implementation adds seq-mask-tis support to the loss function with validation rules and metric tracking.

Changes

Cohort / File(s)	Summary
Documentation Updates `docs/guides/prorlv2.md`	Updated terminology (CE-POP to ICE-POP), introduced seq-mask-tis feature with detailed comparison table, configuration examples, and rationale documentation.
Loss Function Implementation `nemo_rl/algorithms/loss_functions.py`	Extended truncated importance sampling to support three types (tis, icepop, seq-mask-tis). Added sequence-level masking logic using geometric mean computation, validation checks preventing seq-mask-tis with per-sequence IS, and filter-drop metric tracking for icepop and seq-mask-tis paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: support truncated importance sampling #1348: Both PRs modify nemo_rl/algorithms/loss_functions.py to add and validate truncated importance-sampling behavior with token/sequence-level clipping logic.

Suggested labels

CI:L1, r0.4.0

Suggested reviewers

terrykong

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR description lacks test results documentation despite major feature affecting numerics; critical review comments highlight unresolved validation and metric aggregation issues.	Add test results to PR description showing seq-mask-tis produces correct loss values and metrics without convergence regressions; address unresolved review comments on validation and metric aggregation.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a new seq-mask-tis truncated importance sampling type to ProRLv2, which aligns with the primary objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@nemo_rl/algorithms/loss_functions.py`:
- Around line 171-183: In the __init__ (or initializer) validation block where
truncated_importance_sampling_type is checked (the assert on
self.truncated_importance_sampling_type and the subsequent check for
"seq-mask-tis"), add a guard that when self.truncated_importance_sampling_type
is "icepop" or "seq-mask-tis", then self.truncated_importance_sampling_ratio_min
is not None; raise an assertion with a clear message referencing
truncated_importance_sampling_ratio_min and the allowed sampling types to
prevent later TypeError when comparing tensors to None in methods that use this
attribute.
- Line 579: The metric is_filter_drop_frac is a fraction and must not be summed
across packed sequences; instead implement the same special aggregation used for
min/max metrics in SequencePackingLossWrapper: when encountering key
"is_filter_drop_frac" in metrics_accum, accumulate a weighted sum (e.g.,
metrics_accum_sum["is_filter_drop_frac"] += val * weight) and a corresponding
total weight (metrics_accum_weight["is_filter_drop_frac"] += weight) where
weight is the number of examples/sequence length for that packed segment, then
compute the final fraction as metrics_accum_sum / metrics_accum_weight before
reporting; update the code paths that currently do metrics_accum[k] += val to
detect "is_filter_drop_frac" and use this weighted accumulation and final
division (follow the same pattern used for the existing min/max handling in
SequencePackingLossWrapper and use the same helper variables/keys to keep
aggregation consistent).

🧹 Nitpick comments (3)

docs/guides/prorlv2.md (2)

158-184: Documentation for seq-mask-tis looks good overall.

The new section clearly explains the mechanism, provides a comparison table with ICE-POP, and includes configuration snippets.

One observation: the reference bounds in the table (min=0.002, max=0.003) represent a very narrow band for the geometric-mean IS ratio. Consider adding a brief note explaining why these bounds are so far from 1.0 (and so tight), or pointing users to the referenced blog for tuning guidance. Users unfamiliar with the method may assume bounds closer to 1.0 are expected.

182-182: Clarify metric semantics difference.

Line 182 notes that is_filter_drop_frac represents "fraction of tokens (ICE-POP) or sequences (seq-mask-tis)" filtered out. Since the same metric name measures different granularities depending on the mode, this could cause confusion when comparing runs. Consider noting this caveat more prominently or suggesting users check which mode is active when interpreting this metric.
nemo_rl/algorithms/loss_functions.py (1)
418-424: Replace EN DASH (–) with HYPHEN-MINUS (-) in comments.

Ruff (RUF003) flags ambiguous Unicode EN DASH characters in these comment lines. While visually similar, they can cause issues with some tools and are not idiomatic in source code.
Proposed fix
-        # "tis"          – clamp IS weights to [0, max]
-        # "icepop"       – zero out tokens whose IS weight ∉ [min, max]   (ref bounds: 0.5–5)
-        # "seq-mask-tis" – zero out entire sequences whose geometric-mean
-        #                  IS ratio ∉ [min, max]; retained sequences keep
-        #                  raw (non-truncated) token-level IS weights      (ref bounds: 0.002–0.003)
+        # "tis"          - clamp IS weights to [0, max]
+        # "icepop"       - zero out tokens whose IS weight not in [min, max]   (ref bounds: 0.5-5)
+        # "seq-mask-tis" - zero out entire sequences whose geometric-mean
+        #                  IS ratio not in [min, max]; retained sequences keep
+        #                  raw (non-truncated) token-level IS weights      (ref bounds: 0.002-0.003)
The same applies to the comment block at lines 48-51:
-    #   "tis"          – clamp IS weights to max
-    #   "icepop"       – zero out tokens with IS weight outside [min, max]
-    #   "seq-mask-tis" – zero out sequences by geometric-mean IS ratio, non-truncated token IS correction
+    #   "tis"          - clamp IS weights to max
+    #   "icepop"       - zero out tokens with IS weight outside [min, max]
+    #   "seq-mask-tis" - zero out sequences by geometric-mean IS ratio, non-truncated token IS correction

nemo_rl/algorithms/loss_functions.py

hijkzzz · 2026-02-10T02:00:29Z

@terrykong @yfw the l1/l0 tests passed.

terrykong

thanks for contributing @hijkzzz. to help others who come across this PR, is there any experimental results you can share showing how this helps stability in your experiments?

In the Demystifying blog they show "Token-Level MIS < Sequence-Level MIS" for stability. Any reason why you implemented the token level one instead of the sequence level one first?

Also, i think it would be good to have unit tests for all these importance sampling techniques for correctness

docs/guides/prorlv2.md

nemo_rl/algorithms/loss_functions.py

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: hijkzzz <janhu9527@gmail.com>

- Apply nan_to_num to prev_logprobs - generation_logprobs before masked_mean in seq-mask-tis, preventing inf/NaN from corrupting the geometric-mean IS ratio computation. - Rename icepop metric key to is_oob_ratio for consistency with seq-mask-tis. - Fix seq-mask-tis reference bounds in docs (0.999–1.002, not 0.002–0.003) and correct swapped yaml config values. - Add unit tests for icepop and seq-mask-tis code paths in ClippedPGLossFn, including nan_to_num coverage. Signed-off-by: jianh <jianh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: jianh <jianh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

hijkzzz · 2026-02-12T02:16:37Z

thanks for contributing @hijkzzz. to help others who come across this PR, is there any experimental results you can share showing how this helps stability in your experiments?

In the Demystifying blog they show "Token-Level MIS < Sequence-Level MIS" for stability. Any reason why you implemented the token level one instead of the sequence level one first?

Also, i think it would be good to have unit tests for all these importance sampling techniques for correctness

We found seq-based filtering to be more stable for MoE models.

hijkzzz · 2026-02-12T10:46:47Z

@terrykong all tests passed please merge it

nemo_rl/algorithms/loss_functions.py

hijkzzz · 2026-02-14T01:59:49Z

@terrykong fixed please merge it

Calculate the out-of-bounds ratio for the "tis" type so users can monitor how often IS weights exceed the truncation threshold, consistent with the existing metrics for "icepop" and "seq-mask-tis". Signed-off-by: jianh <jianh@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

terrykong

lgtm. @hijkzzz thanks for making nemo-rl better!

…VIDIA-NeMo#1899) Signed-off-by: jianh <jianh@nvidia.com> Signed-off-by: hijkzzz <janhu9527@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

hijkzzz requested review from a team as code owners February 9, 2026 06:04

github-actions bot added the documentation Improvements or additions to documentation label Feb 9, 2026

hijkzzz requested review from terrykong and yfw February 9, 2026 06:04

hijkzzz added the CI:L1 Run doctests, unit tests, and functional tests label Feb 9, 2026

hijkzzz temporarily deployed to nemo-ci February 9, 2026 06:04 — with GitHub Actions Inactive

hijkzzz temporarily deployed to nemo-ci February 9, 2026 06:08 — with GitHub Actions Inactive

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

nemo_rl/algorithms/loss_functions.py Show resolved Hide resolved

nemo_rl/algorithms/loss_functions.py Show resolved Hide resolved

hijkzzz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 9, 2026

hijkzzz temporarily deployed to nemo-ci February 9, 2026 11:41 — with GitHub Actions Inactive

hijkzzz temporarily deployed to nemo-ci February 9, 2026 11:45 — with GitHub Actions Inactive

Merge branch 'main' into jianh/prorlv2-mis

6105da6

hijkzzz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 9, 2026

hijkzzz temporarily deployed to nemo-ci February 9, 2026 20:11 — with GitHub Actions Inactive

hijkzzz temporarily deployed to nemo-ci February 9, 2026 20:40 — with GitHub Actions Inactive

hijkzzz temporarily deployed to nemo-ci February 9, 2026 22:31 — with GitHub Actions Inactive

hijkzzz self-assigned this Feb 10, 2026

terrykong reviewed Feb 11, 2026

View reviewed changes

docs/guides/prorlv2.md Outdated Show resolved Hide resolved

nemo_rl/algorithms/loss_functions.py Outdated Show resolved Hide resolved

hijkzzz and others added 3 commits February 12, 2026 09:40

Update docs/guides/prorlv2.md

5763a6c

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: hijkzzz <janhu9527@gmail.com>

Update nemo_rl/algorithms/loss_functions.py

b00a392

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: hijkzzz <janhu9527@gmail.com>

hijkzzz requested a review from a team as a code owner February 12, 2026 02:15

Merge branch 'main' into jianh/prorlv2-mis

9a0cb51

hijkzzz removed the CI:L1 Run doctests, unit tests, and functional tests label Feb 12, 2026

hijkzzz added the CI:L1 Run doctests, unit tests, and functional tests label Feb 12, 2026

hijkzzz temporarily deployed to nemo-ci February 12, 2026 02:16 — with GitHub Actions Inactive

hijkzzz temporarily deployed to nemo-ci February 12, 2026 04:55 — with GitHub Actions Inactive

hijkzzz temporarily deployed to nemo-ci February 12, 2026 08:08 — with GitHub Actions Inactive

terrykong reviewed Feb 13, 2026

View reviewed changes

nemo_rl/algorithms/loss_functions.py Show resolved Hide resolved

Merge branch 'main' into jianh/prorlv2-mis

c0109f7

hijkzzz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 14, 2026

hijkzzz had a problem deploying to nemo-ci February 14, 2026 01:59 — with GitHub Actions Error

hijkzzz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 14, 2026

hijkzzz temporarily deployed to nemo-ci February 14, 2026 02:09 — with GitHub Actions Inactive

hijkzzz temporarily deployed to nemo-ci February 14, 2026 04:38 — with GitHub Actions Inactive

terrykong approved these changes Feb 14, 2026

View reviewed changes

hijkzzz temporarily deployed to nemo-ci February 14, 2026 06:36 — with GitHub Actions Inactive

hijkzzz removed the request for review from yfw February 14, 2026 16:29

terrykong merged commit 2841fef into NVIDIA-NeMo:main Feb 16, 2026
42 of 43 checks passed

Conversation

hijkzzz commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hijkzzz commented Feb 10, 2026

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hijkzzz commented Feb 12, 2026

Uh oh!

hijkzzz commented Feb 12, 2026

Uh oh!

Uh oh!

hijkzzz commented Feb 14, 2026

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hijkzzz commented Feb 9, 2026 •

edited by coderabbitai bot

Loading