[https://nvbugs/5879577][fix] Fix KeyError in DeepSeekV3Lite FP8 MTP weight loading by sunnyqgg · Pull Request #12530 · NVIDIA/TensorRT-LLM

sunnyqgg · 2026-03-25T07:43:11Z

Summary

Fix KeyError: 'model.layers.30.self_attn.kv_a_proj_with_mqa.weight' when loading DeepSeekV3Lite FP8 with vanilla MTP (num_nextn_predict_layers=2)
Root cause: ConsumableWeightsDict.mark_consumed() deletes checkpoint weights after the first MTP layer loads them, but when model_nextn > ckpt_nextn, subsequent MTP layers share the same checkpoint weights via modulo remapping and fail with KeyError
Fix: Skip mark_consumed for MTP layers when checkpoint weights are shared across multiple model MTP layers
Remove corresponding test waiver from waives.txt

Test plan

Run accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales[mtp=vanilla-fp8kv=False-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False] on H100
Verify no regression on other DeepSeekV3Lite FP8 tests (eagle MTP, non-MTP)
/bot run --extra-stage "DGX_H100_PCIe-PyTorch-Post-Merge-2"

Summary by CodeRabbit

New Features
- Implemented MTP checkpoint-weight sharing detection for DeepSeek V3 models, optimizing weight loading during inference.
Tests
- Removed skipped test entry from DeepSeek V3 fp8 precision test suite, allowing previously excluded test variations to execute.

…weight loading When model_nextn > ckpt_nextn (e.g., model requests 2 MTP layers but checkpoint provides 1), multiple model MTP layers map to the same checkpoint layer via modulo remapping. ConsumableWeightsDict.mark_consumed() deletes checkpoint weights after the first MTP layer loads them, causing KeyError when subsequent MTP layers try to load the same weights. Skip mark_consumed for MTP layers with shared checkpoint weights to prevent premature deletion of weights needed by later MTP layers. Signed-off-by: qgai <qgai@nvidia.com>

…P8 MTP Remove the test waiver now that the underlying KeyError is fixed. Signed-off-by: qgai <qgai@nvidia.com>

coderabbitai · 2026-03-25T07:49:44Z

📝 Walkthrough

Walkthrough

Modified DeepseekV3 weight loading to detect and handle MTP checkpoint-weight sharing by comparing num_nextn_predict_layers between checkpoint and spec config. When shared MTP weights are detected, module indices are remapped via modulo and weight consumption is skipped for affected layers. A test waive entry is removed.

Changes

Cohort / File(s)	Summary
DeepseekV3 MTP Weight Sharing `tensorrt_llm/_torch/models/modeling_deepseekv3.py`	Added checkpoint-weight sharing detection by comparing `ckpt_nextn` and `model_nextn` when MTP is enabled. Layer remapping via modulo occurs for shared MTP layers. Updated `mark_consumed()` calls across kv_b_proj, kv_a_proj_with_mqa, MoE weights, and related parameters to conditionally skip marking when layers are shared, preventing duplicate consumption tracking.
Test Waive Removal `tests/integration/test_lists/waives.txt`	Removed skip entry for DeepSeekV3Lite FP8 test with specific parameter combination (mtp=vanilla-fp8kv=False-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing a KeyError in DeepSeekV3Lite FP8 MTP weight loading, with proper ticket format and type prefix.
Description check	✅ Passed	The PR description provides a clear summary of the issue, root cause, solution, and comprehensive test plan with evidence of verification.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sunnyqgg · 2026-03-25T08:01:10Z

/bot run --extra-stage "DGX_H100_PCIe-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-03-25T08:06:42Z

PR_Github #40291 [ run ] triggered by Bot. Commit: 3bdff72 Link to invocation

tensorrt-cicd · 2026-03-25T12:47:25Z

PR_Github #40291 [ run ] completed with state FAILURE. Commit: 3bdff72
/LLM/main/L0_MergeRequest_PR pipeline #31403 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

sunnyqgg added 2 commits March 25, 2026 00:42

[https://nvbugs/5879577][fix] Remove test waiver for DeepSeekV3Lite F…

3bdff72

…P8 MTP Remove the test waiver now that the underlying KeyError is fixed. Signed-off-by: qgai <qgai@nvidia.com>

sunnyqgg requested review from a team as code owners March 25, 2026 07:43

sunnyqgg requested review from Wanli-Jiang and hlu1 March 25, 2026 07:43

github-actions bot assigned sunnyqgg Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/5879577][fix] Fix KeyError in DeepSeekV3Lite FP8 MTP weight loading#12530

[https://nvbugs/5879577][fix] Fix KeyError in DeepSeekV3Lite FP8 MTP weight loading#12530
sunnyqgg wants to merge 2 commits intoNVIDIA:mainfrom
sunnyqgg:bug5879577

sunnyqgg commented Mar 25, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

sunnyqgg commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sunnyqgg commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

sunnyqgg commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sunnyqgg commented Mar 25, 2026 •

edited

Loading

coderabbitai bot commented Mar 25, 2026 •

edited

Loading