Skip to content

feat: ProRLv2 - add seq-mask-tis truncated importance sampling type#1899

Merged
terrykong merged 8 commits intoNVIDIA-NeMo:mainfrom
hijkzzz:jianh/prorlv2-mis
Feb 16, 2026
Merged

feat: ProRLv2 - add seq-mask-tis truncated importance sampling type#1899
terrykong merged 8 commits intoNVIDIA-NeMo:mainfrom
hijkzzz:jianh/prorlv2-mis

Conversation

@hijkzzz
Copy link
Contributor

@hijkzzz hijkzzz commented Feb 9, 2026

Add a new IS filtering mechanism "seq-mask-tis" that masks entire sequences based on the geometric mean of per-token IS ratios, while keeping non-truncated token-level IS weights for gradient correction. Also adds shared is_filter_drop_frac metric for both icepop and seq-mask-tis modes, and documents the new option in prorlv2.md.

Summary by CodeRabbit

  • New Features

    • Introduced seq-mask-tis, a sequence-level alternative to ICE-POP for importance sampling that operates at the sequence level rather than per-token.
    • Added is_filter_drop_frac metric for monitoring filtering behavior.
  • Documentation

    • Comprehensive guide covering seq-mask-tis with rationale, configuration, and comparison with ICE-POP.
    • Updated terminology and expanded examples with new feature references.

Add a new IS filtering mechanism "seq-mask-tis" that masks entire
sequences based on the geometric mean of per-token IS ratios, while
keeping non-truncated token-level IS weights for gradient correction.
Also adds shared `is_filter_drop_frac` metric for both icepop and
seq-mask-tis modes, and documents the new option in prorlv2.md.

Signed-off-by: jianh <jianh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@hijkzzz hijkzzz requested review from a team as code owners February 9, 2026 06:04
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 9, 2026
@hijkzzz hijkzzz requested review from terrykong and yfw February 9, 2026 06:04
@hijkzzz hijkzzz added the CI:L1 Run doctests, unit tests, and functional tests label Feb 9, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

The PR introduces seq-mask-tis, a sequence-level importance sampling approach as an alternative to ICE-POP. Documentation is updated with terminology clarification and detailed comparison between methods. Implementation adds seq-mask-tis support to the loss function with validation rules and metric tracking.

Changes

Cohort / File(s) Summary
Documentation Updates
docs/guides/prorlv2.md
Updated terminology (CE-POP to ICE-POP), introduced seq-mask-tis feature with detailed comparison table, configuration examples, and rationale documentation.
Loss Function Implementation
nemo_rl/algorithms/loss_functions.py
Extended truncated importance sampling to support three types (tis, icepop, seq-mask-tis). Added sequence-level masking logic using geometric mean computation, validation checks preventing seq-mask-tis with per-sequence IS, and filter-drop metric tracking for icepop and seq-mask-tis paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

CI:L1, r0.4.0

Suggested reviewers

  • terrykong
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR description lacks test results documentation despite major feature affecting numerics; critical review comments highlight unresolved validation and metric aggregation issues. Add test results to PR description showing seq-mask-tis produces correct loss values and metrics without convergence regressions; address unresolved review comments on validation and metric aggregation.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a new seq-mask-tis truncated importance sampling type to ProRLv2, which aligns with the primary objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@nemo_rl/algorithms/loss_functions.py`:
- Around line 171-183: In the __init__ (or initializer) validation block where
truncated_importance_sampling_type is checked (the assert on
self.truncated_importance_sampling_type and the subsequent check for
"seq-mask-tis"), add a guard that when self.truncated_importance_sampling_type
is "icepop" or "seq-mask-tis", then self.truncated_importance_sampling_ratio_min
is not None; raise an assertion with a clear message referencing
truncated_importance_sampling_ratio_min and the allowed sampling types to
prevent later TypeError when comparing tensors to None in methods that use this
attribute.
- Line 579: The metric is_filter_drop_frac is a fraction and must not be summed
across packed sequences; instead implement the same special aggregation used for
min/max metrics in SequencePackingLossWrapper: when encountering key
"is_filter_drop_frac" in metrics_accum, accumulate a weighted sum (e.g.,
metrics_accum_sum["is_filter_drop_frac"] += val * weight) and a corresponding
total weight (metrics_accum_weight["is_filter_drop_frac"] += weight) where
weight is the number of examples/sequence length for that packed segment, then
compute the final fraction as metrics_accum_sum / metrics_accum_weight before
reporting; update the code paths that currently do metrics_accum[k] += val to
detect "is_filter_drop_frac" and use this weighted accumulation and final
division (follow the same pattern used for the existing min/max handling in
SequencePackingLossWrapper and use the same helper variables/keys to keep
aggregation consistent).
🧹 Nitpick comments (3)
docs/guides/prorlv2.md (2)

158-184: Documentation for seq-mask-tis looks good overall.

The new section clearly explains the mechanism, provides a comparison table with ICE-POP, and includes configuration snippets.

One observation: the reference bounds in the table (min=0.002, max=0.003) represent a very narrow band for the geometric-mean IS ratio. Consider adding a brief note explaining why these bounds are so far from 1.0 (and so tight), or pointing users to the referenced blog for tuning guidance. Users unfamiliar with the method may assume bounds closer to 1.0 are expected.


182-182: Clarify metric semantics difference.

Line 182 notes that is_filter_drop_frac represents "fraction of tokens (ICE-POP) or sequences (seq-mask-tis)" filtered out. Since the same metric name measures different granularities depending on the mode, this could cause confusion when comparing runs. Consider noting this caveat more prominently or suggesting users check which mode is active when interpreting this metric.

nemo_rl/algorithms/loss_functions.py (1)

418-424: Replace EN DASH () with HYPHEN-MINUS (-) in comments.

Ruff (RUF003) flags ambiguous Unicode EN DASH characters in these comment lines. While visually similar, they can cause issues with some tools and are not idiomatic in source code.

Proposed fix
-        # "tis"          – clamp IS weights to [0, max]
-        # "icepop"       – zero out tokens whose IS weight ∉ [min, max]   (ref bounds: 0.5–5)
-        # "seq-mask-tis" – zero out entire sequences whose geometric-mean
-        #                  IS ratio ∉ [min, max]; retained sequences keep
-        #                  raw (non-truncated) token-level IS weights      (ref bounds: 0.002–0.003)
+        # "tis"          - clamp IS weights to [0, max]
+        # "icepop"       - zero out tokens whose IS weight not in [min, max]   (ref bounds: 0.5-5)
+        # "seq-mask-tis" - zero out entire sequences whose geometric-mean
+        #                  IS ratio not in [min, max]; retained sequences keep
+        #                  raw (non-truncated) token-level IS weights      (ref bounds: 0.002-0.003)

The same applies to the comment block at lines 48-51:

-    #   "tis"          – clamp IS weights to max
-    #   "icepop"       – zero out tokens with IS weight outside [min, max]
-    #   "seq-mask-tis" – zero out sequences by geometric-mean IS ratio, non-truncated token IS correction
+    #   "tis"          - clamp IS weights to max
+    #   "icepop"       - zero out tokens with IS weight outside [min, max]
+    #   "seq-mask-tis" - zero out sequences by geometric-mean IS ratio, non-truncated token IS correction

@hijkzzz hijkzzz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 9, 2026
@hijkzzz hijkzzz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 9, 2026
@hijkzzz
Copy link
Contributor Author

hijkzzz commented Feb 10, 2026

@terrykong @yfw the l1/l0 tests passed.

@hijkzzz hijkzzz self-assigned this Feb 10, 2026
Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for contributing @hijkzzz. to help others who come across this PR, is there any experimental results you can share showing how this helps stability in your experiments?

In the Demystifying blog they show "Token-Level MIS < Sequence-Level MIS" for stability. Any reason why you implemented the token level one instead of the sequence level one first?

Also, i think it would be good to have unit tests for all these importance sampling techniques for correctness

hijkzzz and others added 3 commits February 12, 2026 09:40
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: hijkzzz <janhu9527@gmail.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: hijkzzz <janhu9527@gmail.com>
- Apply nan_to_num to prev_logprobs - generation_logprobs before
  masked_mean in seq-mask-tis, preventing inf/NaN from corrupting
  the geometric-mean IS ratio computation.
- Rename icepop metric key to is_oob_ratio for consistency with
  seq-mask-tis.
- Fix seq-mask-tis reference bounds in docs (0.999–1.002, not
  0.002–0.003) and correct swapped yaml config values.
- Add unit tests for icepop and seq-mask-tis code paths in
  ClippedPGLossFn, including nan_to_num coverage.

Signed-off-by: jianh <jianh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: jianh <jianh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@hijkzzz hijkzzz requested a review from a team as a code owner February 12, 2026 02:15
@hijkzzz hijkzzz removed the CI:L1 Run doctests, unit tests, and functional tests label Feb 12, 2026
@hijkzzz hijkzzz added the CI:L1 Run doctests, unit tests, and functional tests label Feb 12, 2026
@hijkzzz
Copy link
Contributor Author

hijkzzz commented Feb 12, 2026

thanks for contributing @hijkzzz. to help others who come across this PR, is there any experimental results you can share showing how this helps stability in your experiments?

In the Demystifying blog they show "Token-Level MIS < Sequence-Level MIS" for stability. Any reason why you implemented the token level one instead of the sequence level one first?

Also, i think it would be good to have unit tests for all these importance sampling techniques for correctness

We found seq-based filtering to be more stable for MoE models.

@hijkzzz
Copy link
Contributor Author

hijkzzz commented Feb 12, 2026

@terrykong all tests passed please merge it

@hijkzzz hijkzzz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 14, 2026
@hijkzzz
Copy link
Contributor Author

hijkzzz commented Feb 14, 2026

@terrykong fixed please merge it

Calculate the out-of-bounds ratio for the "tis" type so users can
monitor how often IS weights exceed the truncation threshold, consistent
with the existing metrics for "icepop" and "seq-mask-tis".

Signed-off-by: jianh <jianh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@hijkzzz hijkzzz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 14, 2026
Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @hijkzzz thanks for making nemo-rl better!

@hijkzzz hijkzzz removed the request for review from yfw February 14, 2026 16:29
@terrykong terrykong merged commit 2841fef into NVIDIA-NeMo:main Feb 16, 2026
42 of 43 checks passed
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
…VIDIA-NeMo#1899)

Signed-off-by: jianh <jianh@nvidia.com>
Signed-off-by: hijkzzz <janhu9527@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants