Skip to content

feat: support top-p top-k in grpo#2053

Open
yuki-97 wants to merge 20 commits intomainfrom
yukih/topp-topk
Open

feat: support top-p top-k in grpo#2053
yuki-97 wants to merge 20 commits intomainfrom
yukih/topp-topk

Conversation

@yuki-97
Copy link
Contributor

@yuki-97 yuki-97 commented Mar 3, 2026

Thanks a lot for @zhandaz for the original implementation #1578 and re-implement #1938. Really awesome work!
This PR re-implement this based on the refactored loss structure.

What does this PR do ?

For the logprobs:

  1. generation_logprobs, curr_logprobs and prev_logprobs are computed with filtering.
  2. reference_policy_logprobs is computed without filtering.

This PR:

  1. Apply top-p top-k and apply mask -inf on prev_logprobs at get_logprobs, since we need to call compute_and_apply_seq_logprob_error_masking before train, which needs the updated prev_logprobs to compare with generation_logprobs.
  2. Apply top-p top-k and apply mask -inf on curr_logprobs at prepare_loss_input, since we handle distributed logits to logprobs here.
  3. Save and restore sampling params at use_reference_model to make reference_policy_logprobs computed without filtering.
  4. For TP, we convert from vocab-parallel to batch-sequence-parallel layout via all-to-all communication, apply filtering on the full vocabulary, then convert back. This avoids materializing the full vocabulary on any single rank.

More details can be seen at #1578 and #1938.

Test Result

convergence time cost
dtensorv1 image image
dtensorv2 image image
mcore image image

Summary by CodeRabbit

  • New Features

    • Added configurable sampling parameters (temperature, top-k, top-p) for training with new YAML configuration examples.
    • Introduced memory-efficient top-k and top-p filtering for logit sampling.
  • Bug Fixes

    • Improved token-level masking for logprob calculations in training pipelines.
  • Tests

    • Added comprehensive test coverage for sampling parameter validation and distributed logprob computations.
    • Expanded nightly test suite with new sampling configuration scenarios.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

ℹ️ File Consistency Check

Check based on commit: d2c89a8 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

ℹ️ File Consistency Check

Check based on commit: 75ab1d4 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

ℹ️ File Consistency Check

Check based on commit: 7931cb0 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

ℹ️ File Consistency Check

Check based on commit: 76064bd (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Mar 3, 2026
@yuki-97 yuki-97 removed the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

ℹ️ File Consistency Check

Check based on commit: a0423a5 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

ℹ️ File Consistency Check

Check based on commit: 3d5ff4f (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97 yuki-97 added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

ℹ️ File Consistency Check

Check based on commit: 69966dc (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97 yuki-97 added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

ℹ️ File Consistency Check

Check based on commit: 6ccc659 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97 yuki-97 removed the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Mar 4, 2026
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

ℹ️ File Consistency Check

Check based on commit: 95ec444 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Mar 4, 2026
@yuki-97 yuki-97 removed the CI:L1 Run doctests, unit tests, and functional tests label Mar 4, 2026
@github-actions
Copy link

github-actions bot commented Mar 9, 2026

ℹ️ File Consistency Check

Check based on commit: cc84132 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Mar 9, 2026
@github-actions
Copy link

github-actions bot commented Mar 9, 2026

ℹ️ File Consistency Check

Check based on commit: a0474b3 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: b769172 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

zhandaz and others added 20 commits March 9, 2026 20:58
(not cp loss_function.py common.py megatron_policy_worker.py)

Signed-off-by: Zhanda <zhandazhu@gmail.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Zhanda <zhandazhu@gmail.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
…_masking

Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@github-actions
Copy link

ℹ️ File Consistency Check

Check based on commit: 0e6e0f7 (PR #2053 from yukih/topp-topk)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@yuki-97
Copy link
Contributor Author

yuki-97 commented Mar 10, 2026

/ok to test 0e6e0f7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants