Skip to content

Conversation

@terrykong
Copy link
Contributor

@terrykong terrykong commented Jan 6, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Generation configurations now include new parameters for memory buffer management, CUDA graph optimization, and prefill strategies to improve inference performance and resource efficiency.
    • Added support for unified memory configuration and expanded resource tuning options.
  • Chores

    • Updated example generation configurations to dynamically reference sequence length settings instead of hardcoded token limits, improving configuration consistency and deployment flexibility.

✏️ Tip: You can customize this high-level summary in your review settings.

@terrykong terrykong requested a review from yuki-97 January 6, 2026 08:14
@terrykong terrykong requested a review from a team as a code owner January 6, 2026 08:14
@terrykong terrykong added CI:L0 Run doctests and unit tests r0.5.0 labels Jan 6, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

Configuration files for GRPO math models updated with mcore generation parameters. A new mcore_generation_config block was added to one file containing memory and CUDA-graph tuning parameters. In the other file, a hardcoded max_tokens value was replaced with a dynamic reference to policy.max_total_sequence_length.

Changes

Cohort / File(s) Summary
GRPO Math Configuration Updates
examples/configs/grpo_math_1B.yaml, examples/configs/grpo_math_1B_megatron.yaml
Added/modified mcore_generation_config block under generation section with memory and CUDA-graph tuning parameters (buffer_size_gb, buffer_guaranteed_fraction, num_cuda_graphs, block_size_tokens, use_cuda_graphs_for_non_decode_steps, enable_chunked_prefill, unified_memory_level). Replaced hardcoded max_tokens (16384) with dynamic reference to policy.max_total_sequence_length.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

  • yuki-97
  • joyang-nv

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR modifies generation configuration and CUDA tuning parameters that affect inference numerics and performance, but no test results or performance metrics are documented. Add test results, performance metrics, and regression analysis to the PR description demonstrating that the configuration changes achieve their intended purpose without introducing regressions.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references 'mcore generation config' and 'nightly test', which directly correspond to the YAML configuration file changes that add/restore mcore_generation_config blocks.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1720466 and 664f85b.

📒 Files selected for processing (2)
  • examples/configs/grpo_math_1B.yaml
  • examples/configs/grpo_math_1B_megatron.yaml
🧰 Additional context used
📓 Path-based instructions (1)
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • examples/configs/grpo_math_1B_megatron.yaml
  • examples/configs/grpo_math_1B.yaml
🧠 Learnings (3)
📓 Common learnings
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

  • examples/configs/grpo_math_1B_megatron.yaml
  • examples/configs/grpo_math_1B.yaml
📚 Learning: 2025-09-19T03:08:11.537Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml:85-101
Timestamp: 2025-09-19T03:08:11.537Z
Learning: In math reasoning distillation tasks, max_new_tokens should be set to the full context window because prompts are typically much shorter than outputs, which require detailed step-by-step reasoning chains. Reducing max_new_tokens could prevent models from outputting complete answers, negatively impacting validation accuracy calculations.

Applied to files:

  • examples/configs/grpo_math_1B_megatron.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: sphinx-build / Build docs
  • GitHub Check: build-container / main
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (2)
examples/configs/grpo_math_1B_megatron.yaml (1)

153-153: LGTM! Dynamic reference improves maintainability.

Replacing the hardcoded value with a reference to policy.max_total_sequence_length creates a single source of truth and aligns with the max_new_tokens configuration at line 141. This makes the configuration more maintainable and consistent. Based on learnings, setting max_tokens to the full context window is intentional for GRPO configurations.

examples/configs/grpo_math_1B.yaml (1)

219-227: Remove mcore_generation_config or change generation backend to "megatron".

The mcore_generation_config block (lines 219-227) is only consumed by Megatron-based generation workers, but this config uses generation.backend: "vllm" which does not read these parameters. With megatron_cfg.enabled: false, the vLLM worker will be instantiated instead, causing the mcore_generation_config to be silently ignored. Either remove this block if using vLLM generation, or set generation.backend: "megatron" if these parameters are needed.

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

yuki-97
yuki-97 previously approved these changes Jan 6, 2026
@terrykong terrykong mentioned this pull request Jan 6, 2026
@yuki-97 yuki-97 enabled auto-merge (squash) January 6, 2026 09:28
@yuki-97 yuki-97 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Jan 6, 2026
@terrykong terrykong requested review from a team as code owners January 6, 2026 20:12
@yuki-97 yuki-97 removed the CI:L0 Run doctests and unit tests label Jan 7, 2026
@terrykong
Copy link
Contributor Author

moving to draft. @shanmugamr1992 is checking to see if perf can be improved to ~10sec

@terrykong terrykong marked this pull request as draft January 7, 2026 06:46
auto-merge was automatically disabled January 7, 2026 06:46

Pull request was converted to draft

Signed-off-by: Terry Kong <[email protected]>
Signed-off-by: Terry Kong <[email protected]>
Signed-off-by: Terry Kong <[email protected]>
@terrykong terrykong marked this pull request as ready for review January 8, 2026 02:00
@terrykong terrykong requested a review from a team as a code owner January 8, 2026 02:00
@terrykong
Copy link
Contributor Author

@yuki-97 okay this one is good to go now. @shanmugamr1992 's changes will come later since he's still investigating

@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Jan 8, 2026
@terrykong terrykong enabled auto-merge (squash) January 8, 2026 02:00
Signed-off-by: Yuki Huang <[email protected]>
@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 8, 2026
@terrykong terrykong merged commit 75e916f into main Jan 8, 2026
40 of 41 checks passed
@terrykong terrykong deleted the test-fixes-1 branch January 8, 2026 09:12
chtruong814 pushed a commit that referenced this pull request Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests r0.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants