Skip to content

[https://nvbugs/5879577][fix] Fix KeyError in DeepSeekV3Lite FP8 MTP weight loading#12530

Open
sunnyqgg wants to merge 2 commits intoNVIDIA:mainfrom
sunnyqgg:bug5879577
Open

[https://nvbugs/5879577][fix] Fix KeyError in DeepSeekV3Lite FP8 MTP weight loading#12530
sunnyqgg wants to merge 2 commits intoNVIDIA:mainfrom
sunnyqgg:bug5879577

Conversation

@sunnyqgg
Copy link
Collaborator

@sunnyqgg sunnyqgg commented Mar 25, 2026

Summary

  • Fix KeyError: 'model.layers.30.self_attn.kv_a_proj_with_mqa.weight' when loading DeepSeekV3Lite FP8 with vanilla MTP (num_nextn_predict_layers=2)
  • Root cause: ConsumableWeightsDict.mark_consumed() deletes checkpoint weights after the first MTP layer loads them, but when model_nextn > ckpt_nextn, subsequent MTP layers share the same checkpoint weights via modulo remapping and fail with KeyError
  • Fix: Skip mark_consumed for MTP layers when checkpoint weights are shared across multiple model MTP layers
  • Remove corresponding test waiver from waives.txt

Test plan

  • Run accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales[mtp=vanilla-fp8kv=False-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False] on H100
  • Verify no regression on other DeepSeekV3Lite FP8 tests (eagle MTP, non-MTP)
  • /bot run --extra-stage "DGX_H100_PCIe-PyTorch-Post-Merge-2"

Summary by CodeRabbit

  • New Features

    • Implemented MTP checkpoint-weight sharing detection for DeepSeek V3 models, optimizing weight loading during inference.
  • Tests

    • Removed skipped test entry from DeepSeek V3 fp8 precision test suite, allowing previously excluded test variations to execute.

…weight loading

When model_nextn > ckpt_nextn (e.g., model requests 2 MTP layers but
checkpoint provides 1), multiple model MTP layers map to the same
checkpoint layer via modulo remapping. ConsumableWeightsDict.mark_consumed()
deletes checkpoint weights after the first MTP layer loads them, causing
KeyError when subsequent MTP layers try to load the same weights.

Skip mark_consumed for MTP layers with shared checkpoint weights to
prevent premature deletion of weights needed by later MTP layers.

Signed-off-by: qgai <qgai@nvidia.com>
…P8 MTP

Remove the test waiver now that the underlying KeyError is fixed.

Signed-off-by: qgai <qgai@nvidia.com>
@sunnyqgg sunnyqgg requested review from a team as code owners March 25, 2026 07:43
@sunnyqgg sunnyqgg requested review from Wanli-Jiang and hlu1 March 25, 2026 07:43
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Modified DeepseekV3 weight loading to detect and handle MTP checkpoint-weight sharing by comparing num_nextn_predict_layers between checkpoint and spec config. When shared MTP weights are detected, module indices are remapped via modulo and weight consumption is skipped for affected layers. A test waive entry is removed.

Changes

Cohort / File(s) Summary
DeepseekV3 MTP Weight Sharing
tensorrt_llm/_torch/models/modeling_deepseekv3.py
Added checkpoint-weight sharing detection by comparing ckpt_nextn and model_nextn when MTP is enabled. Layer remapping via modulo occurs for shared MTP layers. Updated mark_consumed() calls across kv_b_proj, kv_a_proj_with_mqa, MoE weights, and related parameters to conditionally skip marking when layers are shared, preventing duplicate consumption tracking.
Test Waive Removal
tests/integration/test_lists/waives.txt
Removed skip entry for DeepSeekV3Lite FP8 test with specific parameter combination (mtp=vanilla-fp8kv=False-attention_dp=False-cuda_graph=False-overlap_scheduler=False-torch_compile=False).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing a KeyError in DeepSeekV3Lite FP8 MTP weight loading, with proper ticket format and type prefix.
Description check ✅ Passed The PR description provides a clear summary of the issue, root cause, solution, and comprehensive test plan with evidence of verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@sunnyqgg
Copy link
Collaborator Author

/bot run --extra-stage "DGX_H100_PCIe-PyTorch-Post-Merge-2"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #40291 [ run ] triggered by Bot. Commit: 3bdff72 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #40291 [ run ] completed with state FAILURE. Commit: 3bdff72
/LLM/main/L0_MergeRequest_PR pipeline #31403 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants