[https://nvbugs/5916092][fix] Fix MTP+PP hang by preserving speculative layer weights on last PP rank by xxi-nv · Pull Request #12555 · NVIDIA/TensorRT-LLM

xxi-nv · 2026-03-26T08:18:50Z

Summary

Fix hang in DecoderModel.__pp_init__() when both pipeline parallelism (PP) and MTP speculative decoding are enabled
MTP layers appended beyond num_hidden_layers were incorrectly getting skip_forward() + remove_weights() on all PP ranks, including the last rank where the MTP draft worker needs them
For layers beyond num_hidden_layers: always make them no-ops in the main decoder loop (all ranks), but only remove weights on non-last PP ranks

Affects: DeepSeekV3, NemotronH, ExaoneMoE, GLM (all use self.model.layers.extend(self.draft_model.mtp_layers))

Root cause

__pp_init__ iterates ALL self.layers, including MTP layers at indices >= num_hidden_layers. Since these indices are not in pp_layer_list, the existing code calls skip_forward() which:

Replaces layer.forward with a no-op
Calls remove_weights() to free GPU memory

This happens on ALL ranks, including the last PP rank. The MTP draft worker (running on rank N-1) then finds its speculative layers have no weights → hang.

Fix

for layer_idx, layer in enumerate(self.layers):
    if layer_idx >= num_hidden_layers:
        # Make MTP layers no-ops in main decoder loop (all ranks)
        if hasattr(layer, 'skip_forward'):
            layer.forward = layer.skip_forward
        # Only remove weights on non-last PP ranks
        if not mapping.is_last_pp_rank():
            remove_weights(layer)
        continue

Test plan

CI: L0_Test-SBSA-Multi-GPU TestDeepSeekV3Lite::test_nvfp4_4gpus[moe_backend=CUTLASS-mtp_nextn=2-pp4-...] should pass (was hanging)
CI: Verify no regressions in other PP and MTP tests
Removed waive for bug 5916092 from waives.txt

Summary by CodeRabbit

Bug Fixes
- Improved pipeline-parallel layer initialization to correctly handle configurations with additional layers during distributed processing.
Tests
- Removed test waiver entry, indicating improved compatibility for specific test configuration.

…ve layer weights on last PP rank DecoderModel.__pp_init__ iterates all layers including MTP speculative layers appended via layers.extend(mtp_layers). Since MTP layer indices exceed num_hidden_layers, they are not in pp_layer_list and get skip_forward() on ALL ranks, which replaces forward with a no-op AND removes weights. On non-last PP ranks this is correct (MTP layers unused). But on the last PP rank, the MTP draft worker needs the layer weights. Removing them causes a hang during generation. Fix: For layers beyond num_hidden_layers, always replace forward with skip_forward (so they are no-ops in the main decoder loop on all ranks), but only remove weights on non-last PP ranks. The last PP rank preserves weights so the MTP speculative decoding worker can use them. Affects all models using layers.extend(mtp_layers): DeepSeekV3, NemotronH, ExaoneMoE, and GLM. Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv · 2026-03-26T08:18:57Z

/bot run

coderabbitai · 2026-03-26T08:23:14Z

📝 Walkthrough

Walkthrough

Modified pipeline-parallel layer initialization in DecoderModel to handle cases where self.layers exceeds num_hidden_layers by converting extra layers' forward methods to skip variants and selectively removing weights based on PP rank. Additionally removed one test-waive entry for DeepSeekV3Lite.

Changes

Cohort / File(s)	Summary
Pipeline-Parallel Layer Initialization `tensorrt_llm/_torch/models/modeling_utils.py`	Updated `DecoderModel.__pp_init__` to handle extra layers in the layers list by converting their `forward` to `skip_forward` when available and removing weights on non-last PP ranks.
Test Waiver `tests/integration/test_lists/waives.txt`	Removed waive entry for `accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_nvfp4_4gpus` with specific parameter set and its associated bug reference.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly describes the main fix: resolving an MTP+PP hang by preserving speculative layer weights on the last PP rank, which directly addresses the core issue in the changeset.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering the issue, root cause, fix, affected models, and test plan. All key template sections are properly addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tensorrt_llm/_torch/models/modeling_utils.py (1)

307-316: Good fix for the MTP+PP hang issue.

The logic correctly preserves weights on the last PP rank while making extra layers no-ops in the main decoder loop. The separation of concerns (skip forward vs. remove weights) is well-reasoned.

Consider adding a warning when skip_forward is unavailable. If a layer doesn't have skip_forward, its forward remains unchanged. On the last PP rank, this means the layer would execute normally in the main decoder loop instead of being a no-op. While unlikely with MTP layers (DecoderLayer instances have skip_forward), a warning would be consistent with the existing skip_forward function at lines 163-165 and help debug unexpected behavior.

Suggested defensive warning

             if layer_idx >= num_hidden_layers:
                 # Extra layers (e.g., MTP speculative layers) appended beyond
                 # the base model. Skip their forward on all ranks so they are
                 # no-ops in the main decoder loop, but preserve weights on the
                 # last PP rank where the MTP draft worker needs them.
                 if hasattr(layer, 'skip_forward'):
                     layer.forward = layer.skip_forward
+                else:
+                    logger.warning(
+                        f"Layer {layer_idx} ({layer.__class__.__name__}) does not have "
+                        f"`skip_forward`; it will not be a no-op in the main decoder loop.")
                 if not mapping.is_last_pp_rank():
                     remove_weights(layer)
                 continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_utils.py` around lines 307 - 316, When
handling extra layers (layer_idx >= num_hidden_layers) add a defensive warning
if a layer does not have skip_forward: if hasattr(layer, 'skip_forward') is
false, emit a warning (e.g., logging.warning or warnings.warn) that the extra
layer lacks skip_forward so its forward will remain active on the last PP rank;
keep the existing behavior of calling layer.skip_forward when present and
remove_weights when not mapping.is_last_pp_rank(), but log this unexpected
condition referencing layer_idx, the layer object, and mapping.is_last_pp_rank()
to aid debugging (mirror the existing skip_forward handling used elsewhere).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/_torch/models/modeling_utils.py`:
- Around line 307-316: When handling extra layers (layer_idx >=
num_hidden_layers) add a defensive warning if a layer does not have
skip_forward: if hasattr(layer, 'skip_forward') is false, emit a warning (e.g.,
logging.warning or warnings.warn) that the extra layer lacks skip_forward so its
forward will remain active on the last PP rank; keep the existing behavior of
calling layer.skip_forward when present and remove_weights when not
mapping.is_last_pp_rank(), but log this unexpected condition referencing
layer_idx, the layer object, and mapping.is_last_pp_rank() to aid debugging
(mirror the existing skip_forward handling used elsewhere).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 34356177-7950-4e51-9c55-a9da2695250f

📥 Commits

Reviewing files that changed from the base of the PR and between e44df9e and d5c4c88.

📒 Files selected for processing (2)

tensorrt_llm/_torch/models/modeling_utils.py
tests/integration/test_lists/waives.txt

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

tensorrt-cicd · 2026-03-26T08:24:55Z

PR_Github #40435 [ run ] triggered by Bot. Commit: d5c4c88 Link to invocation

tensorrt-cicd · 2026-03-26T11:04:46Z

PR_Github #40435 [ run ] completed with state SUCCESS. Commit: d5c4c88
/LLM/main/L0_MergeRequest_PR pipeline #31527 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

xxi-nv requested a review from a team as a code owner March 26, 2026 08:18

xxi-nv requested a review from yechank-nvidia March 26, 2026 08:18

github-actions bot assigned xxi-nv Mar 26, 2026

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

xxi-nv removed the request for review from yechank-nvidia March 26, 2026 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/5916092][fix] Fix MTP+PP hang by preserving speculative layer weights on last PP rank#12555

[https://nvbugs/5916092][fix] Fix MTP+PP hang by preserving speculative layer weights on last PP rank#12555
xxi-nv wants to merge 1 commit intoNVIDIA:mainfrom
xxi-nv:dev-xxi-bug5916092-bot

xxi-nv commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

xxi-nv commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xxi-nv commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Test plan

Summary by CodeRabbit

Uh oh!

xxi-nv commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xxi-nv commented Mar 26, 2026 •

edited by coderabbitai bot

Loading