Skip to content

Bug in quantizing Deepseek-R1 #281

@AbhinavDutta

Description

@AbhinavDutta

I'm following the steps at https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/deepseek

The first step works fine

python inference/convert.py --hf-ckpt-path $HF_FP8_CKPT --save-path $DS_CKPT --n-experts 256 --model-parallel 8

However, when I try to do the second step ie

torchrun --nproc-per-node 8 --master_port=12346 ptq.py --model_path $DS_CKPT --config DeepSeek-V3/inference/configs/config_671B.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $FP4_QUANT_PATH

I get the following :

[rank6]: Invalid key(s) in state_dict: "layers.10.ffn.gate.bias", "layers.11.ffn.gate.bias", "layers.12.ffn.gate.bias", "layers.13.ffn.gate.bias", "layers.14.ffn.gate.bias", "layers.15.ffn.gate.bias", "layers.16.ffn.gate.bias", "layers.17.ffn.gate.bias", "layers.18.ffn.gate.bias", "layers.19.ffn.gate.bias", "layers.20.ffn.gate.bias", "layers.21.ffn.gate.bias", "layers.22.ffn.gate.bias", "layers.23.ffn.gate.bias", "layers.24.ffn.gate.bias", "layers.25.ffn.gate.bias", "layers.26.ffn.gate.bias", "layers.27.ffn.gate.bias", "layers.28.ffn.gate.bias", "layers.29.ffn.gate.bias", "layers.3.ffn.gate.bias", "layers.30.ffn.gate.bias", "layers.31.ffn.gate.bias", "layers.32.ffn.gate.bias", "layers.33.ffn.gate.bias", "layers.34.ffn.gate.bias", "layers.35.ffn.gate.bias", "layers.36.ffn.gate.bias", "layers.37.ffn.gate.bias", "layers.38.ffn.gate.bias", "layers.39.ffn.gate.bias", "layers.4.ffn.gate.bias", "layers.40.ffn.gate.bias", "layers.41.ffn.gate.bias", "layers.42.ffn.gate.bias", "layers.43.ffn.gate.bias", "layers.44.ffn.gate.bias", "layers.45.ffn.gate.bias", "layers.46.ffn.gate.bias", "layers.47.ffn.gate.bias", "layers.48.ffn.gate.bias", "layers.49.ffn.gate.bias", "layers.5.ffn.gate.bias", "layers.50.ffn.gate.bias", "layers.51.ffn.gate.bias", "layers.52.ffn.gate.bias", "layers.53.ffn.gate.bias", "layers.54.ffn.gate.bias", "layers.55.ffn.gate.bias", "layers.56.ffn.gate.bias", "layers.57.ffn.gate.bias", "layers.58.ffn.gate.bias", "layers.59.ffn.gate.bias", "layers.6.ffn.gate.bias", "layers.60.ffn.gate.bias", "layers.7.ffn.gate.bias", "layers.8.ffn.gate.bias", "layers.9.ffn.gate.bias", mismatched dtypes or shape.

======================================================================

  • Container used (if applicable): ?
  • OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Ubuntu 22.04.5 LTS
  • CPU architecture (x86_64, aarch64): x86_64
  • GPU name (e.g. H100, A100, L40S): NVIDIA B200
  • GPU memory size: 179.1 GB
  • Number of GPUs: 8
  • Library versions (if applicable):
    • Python: 3.10.12
    • ModelOpt version or commit hash: 0.33.1
    • CUDA: 12.8
    • PyTorch: 2.7.1+cu128
    • Transformers: 4.55.0
      [2025-09-04 19:17:47] INFO config.py:54: PyTorch version 2.7.1+cu128 available.
      /root/venv310-2/lib/python3.10/site-packages/modelopt/torch/init.py:36: UserWarning: transformers version 4.55.0 is incompatible with nvidia-modelopt and may cause issues. Please install recommended version with pip install nvidia-modelopt[hf] if working with HF models.
      _warnings.warn(
      2025-09-04 19:17:49,276 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
      /root/venv310-2/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
      If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
      warnings.warn(
      [TensorRT-LLM] TensorRT-LLM version: 1.1.0rc2
    • TensorRT-LLM: 1.1.0rc2
    • ONNXRuntime: ?
    • TensorRT: 10.11.0.33
  • Any other details that may help: ?
    ======================================================================

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions