fix: mcore generation config restored in nightly test #1720

terrykong · 2026-01-06T08:14:47Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Generation configurations now include new parameters for memory buffer management, CUDA graph optimization, and prefill strategies to improve inference performance and resource efficiency.
- Added support for unified memory configuration and expanded resource tuning options.
Chores
- Updated example generation configurations to dynamically reference sequence length settings instead of hardcoded token limits, improving configuration consistency and deployment flexibility.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-06T08:20:22Z

📝 Walkthrough

Walkthrough

Configuration files for GRPO math models updated with mcore generation parameters. A new mcore_generation_config block was added to one file containing memory and CUDA-graph tuning parameters. In the other file, a hardcoded max_tokens value was replaced with a dynamic reference to policy.max_total_sequence_length.

Changes

Cohort / File(s)	Summary
GRPO Math Configuration Updates `examples/configs/grpo_math_1B.yaml`, `examples/configs/grpo_math_1B_megatron.yaml`	Added/modified `mcore_generation_config` block under generation section with memory and CUDA-graph tuning parameters (buffer_size_gb, buffer_guaranteed_fraction, num_cuda_graphs, block_size_tokens, use_cuda_graphs_for_non_decode_steps, enable_chunked_prefill, unified_memory_level). Replaced hardcoded max_tokens (16384) with dynamic reference to `policy.max_total_sequence_length`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

yuki-97
joyang-nv

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR modifies generation configuration and CUDA tuning parameters that affect inference numerics and performance, but no test results or performance metrics are documented.	Add test results, performance metrics, and regression analysis to the PR description demonstrating that the configuration changes achieve their intended purpose without introducing regressions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title references 'mcore generation config' and 'nightly test', which directly correspond to the YAML configuration file changes that add/restore mcore_generation_config blocks.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1720466 and 664f85b.

📒 Files selected for processing (2)

examples/configs/grpo_math_1B.yaml
examples/configs/grpo_math_1B_megatron.yaml

🧰 Additional context used

📓 Path-based instructions (1)

!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

examples/configs/grpo_math_1B_megatron.yaml
examples/configs/grpo_math_1B.yaml

🧠 Learnings (3)

📓 Common learnings

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

📚 Learning: 2025-09-19T03:00:58.662Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

examples/configs/grpo_math_1B_megatron.yaml
examples/configs/grpo_math_1B.yaml

📚 Learning: 2025-09-19T03:08:11.537Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml:85-101
Timestamp: 2025-09-19T03:08:11.537Z
Learning: In math reasoning distillation tasks, max_new_tokens should be set to the full context window because prompts are typically much shorter than outputs, which require detailed step-by-step reasoning chains. Reducing max_new_tokens could prevent models from outputting complete answers, negatively impacting validation accuracy calculations.

Applied to files:

examples/configs/grpo_math_1B_megatron.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: sphinx-build / Build docs
GitHub Check: build-container / main
GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (2)

examples/configs/grpo_math_1B_megatron.yaml (1)

153-153: LGTM! Dynamic reference improves maintainability.

Replacing the hardcoded value with a reference to policy.max_total_sequence_length creates a single source of truth and aligns with the max_new_tokens configuration at line 141. This makes the configuration more maintainable and consistent. Based on learnings, setting max_tokens to the full context window is intentional for GRPO configurations.

examples/configs/grpo_math_1B.yaml (1)

219-227: Remove mcore_generation_config or change generation backend to "megatron".

The mcore_generation_config block (lines 219-227) is only consumed by Megatron-based generation workers, but this config uses generation.backend: "vllm" which does not read these parameters. With megatron_cfg.enabled: false, the vLLM worker will be instantiated instead, causing the mcore_generation_config to be silently ignored. Either remove this block if using vLLM generation, or set generation.backend: "megatron" if these parameters are needed.

Likely an incorrect or invalid review comment.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

examples/configs/grpo_math_1B_megatron.yaml

terrykong · 2026-01-07T06:46:36Z

moving to draft. @shanmugamr1992 is checking to see if perf can be improved to ~10sec

Signed-off-by: Terry Kong <[email protected]>

terrykong · 2026-01-08T02:00:38Z

@yuki-97 okay this one is good to go now. @shanmugamr1992 's changes will come later since he's still investigating

Signed-off-by: Yuki Huang <[email protected]>

Signed-off-by: Terry Kong <[email protected]> Signed-off-by: NeMo Bot <[email protected]>

terrykong requested a review from yuki-97 January 6, 2026 08:14

terrykong requested a review from a team as a code owner January 6, 2026 08:14

terrykong added CI:L0 Run doctests and unit tests r0.5.0 labels Jan 6, 2026

terrykong temporarily deployed to nemo-ci January 6, 2026 08:14 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci January 6, 2026 08:18 — with GitHub Actions Inactive

yuki-97 previously approved these changes Jan 6, 2026

View reviewed changes

terrykong mentioned this pull request Jan 6, 2026

Test fixes #1718

Draft

yuki-97 enabled auto-merge (squash) January 6, 2026 09:28

yuki-97 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Jan 6, 2026

yuki-97 temporarily deployed to nemo-ci January 6, 2026 10:53 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci January 6, 2026 11:11 — with GitHub Actions Inactive

shanmugamr1992 reviewed Jan 6, 2026

View reviewed changes

examples/configs/grpo_math_1B_megatron.yaml Show resolved Hide resolved

terrykong dismissed yuki-97’s stale review via 4bb7668 January 6, 2026 20:12

terrykong requested review from a team as code owners January 6, 2026 20:12

yuki-97 removed the CI:L0 Run doctests and unit tests label Jan 7, 2026

terrykong marked this pull request as draft January 7, 2026 06:46

auto-merge was automatically disabled January 7, 2026 06:46
Pull request was converted to draft

terrykong added 4 commits January 8, 2026 01:59

fix: mcore generation config restored in nightly test

f7d4f23

Signed-off-by: Terry Kong <[email protected]>

fix

45d0ad3

Signed-off-by: Terry Kong <[email protected]>

16384

1fce978

Signed-off-by: Terry Kong <[email protected]>

17.5 step time

868a898

Signed-off-by: Terry Kong <[email protected]>

terrykong force-pushed the test-fixes-1 branch from 6f341d1 to 868a898 Compare January 8, 2026 02:00

terrykong marked this pull request as ready for review January 8, 2026 02:00

terrykong requested a review from a team as a code owner January 8, 2026 02:00

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Jan 8, 2026

terrykong enabled auto-merge (squash) January 8, 2026 02:00

terrykong temporarily deployed to nemo-ci January 8, 2026 02:01 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci January 8, 2026 02:04 — with GitHub Actions Inactive

fix unit test

adc02f3

Signed-off-by: Yuki Huang <[email protected]>

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 8, 2026

yuki-97 temporarily deployed to nemo-ci January 8, 2026 05:24 — with GitHub Actions Inactive

yuki-97 approved these changes Jan 8, 2026

View reviewed changes

yuki-97 temporarily deployed to nemo-ci January 8, 2026 05:28 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci January 8, 2026 07:14 — with GitHub Actions Inactive

terrykong merged commit 75e916f into main Jan 8, 2026
40 of 41 checks passed

terrykong deleted the test-fixes-1 branch January 8, 2026 09:12

chtruong814 pushed a commit that referenced this pull request Jan 8, 2026

fix: mcore generation config restored in nightly test (#1720)

312c07c

Signed-off-by: Terry Kong <[email protected]> Signed-off-by: NeMo Bot <[email protected]>

coderabbitai bot mentioned this pull request Jan 8, 2026

cp: fix: mcore generation config restored in nightly test (1720) into r0.5.0 #1740

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: mcore generation config restored in nightly test #1720

fix: mcore generation config restored in nightly test #1720

terrykong commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 6, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

Uh oh!

terrykong commented Jan 7, 2026

Uh oh!

terrykong commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: mcore generation config restored in nightly test #1720

fix: mcore generation config restored in nightly test #1720

Conversation

terrykong commented Jan 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 6, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Pre-merge checks

Uh oh!

Uh oh!

terrykong commented Jan 7, 2026

Uh oh!

terrykong commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

terrykong commented Jan 6, 2026 •

edited by coderabbitai bot

Loading