Bug in quantizing Deepseek-R1

I'm following the steps at https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/deepseek

The first step works fine
```
python inference/convert.py --hf-ckpt-path $HF_FP8_CKPT --save-path $DS_CKPT --n-experts 256 --model-parallel 8
```

However, when I try to do the second step ie
```
torchrun --nproc-per-node 8 --master_port=12346 ptq.py --model_path $DS_CKPT --config DeepSeek-V3/inference/configs/config_671B.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $FP4_QUANT_PATH
```
I get the following :

[rank6]:     Invalid key(s) in state_dict: "layers.10.ffn.gate.bias", "layers.11.ffn.gate.bias", "layers.12.ffn.gate.bias", "layers.13.ffn.gate.bias", "layers.14.ffn.gate.bias", "layers.15.ffn.gate.bias", "layers.16.ffn.gate.bias", "layers.17.ffn.gate.bias", "layers.18.ffn.gate.bias", "layers.19.ffn.gate.bias", "layers.20.ffn.gate.bias", "layers.21.ffn.gate.bias", "layers.22.ffn.gate.bias", "layers.23.ffn.gate.bias", "layers.24.ffn.gate.bias", "layers.25.ffn.gate.bias", "layers.26.ffn.gate.bias", "layers.27.ffn.gate.bias", "layers.28.ffn.gate.bias", "layers.29.ffn.gate.bias", "layers.3.ffn.gate.bias", "layers.30.ffn.gate.bias", "layers.31.ffn.gate.bias", "layers.32.ffn.gate.bias", "layers.33.ffn.gate.bias", "layers.34.ffn.gate.bias", "layers.35.ffn.gate.bias", "layers.36.ffn.gate.bias", "layers.37.ffn.gate.bias", "layers.38.ffn.gate.bias", "layers.39.ffn.gate.bias", "layers.4.ffn.gate.bias", "layers.40.ffn.gate.bias", "layers.41.ffn.gate.bias", "layers.42.ffn.gate.bias", "layers.43.ffn.gate.bias", "layers.44.ffn.gate.bias", "layers.45.ffn.gate.bias", "layers.46.ffn.gate.bias", "layers.47.ffn.gate.bias", "layers.48.ffn.gate.bias", "layers.49.ffn.gate.bias", "layers.5.ffn.gate.bias", "layers.50.ffn.gate.bias", "layers.51.ffn.gate.bias", "layers.52.ffn.gate.bias", "layers.53.ffn.gate.bias", "layers.54.ffn.gate.bias", "layers.55.ffn.gate.bias", "layers.56.ffn.gate.bias", "layers.57.ffn.gate.bias", "layers.58.ffn.gate.bias", "layers.59.ffn.gate.bias", "layers.6.ffn.gate.bias", "layers.60.ffn.gate.bias", "layers.7.ffn.gate.bias", "layers.8.ffn.gate.bias", "layers.9.ffn.gate.bias", mismatched dtypes or shape.



======================================================================
- Container used (if applicable): ?
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Ubuntu 22.04.5 LTS
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): NVIDIA B200
- GPU memory size: 179.1 GB
- Number of GPUs: 8
- Library versions (if applicable):
  - Python: 3.10.12
  - ModelOpt version or commit hash: 0.33.1
  - CUDA: 12.8
  - PyTorch: 2.7.1+cu128
  - Transformers: 4.55.0
[2025-09-04 19:17:47] INFO config.py:54: PyTorch version 2.7.1+cu128 available.
/root/venv310-2/lib/python3.10/site-packages/modelopt/torch/__init__.py:36: UserWarning: transformers version 4.55.0 is incompatible with nvidia-modelopt and may cause issues. Please install recommended version with `pip install nvidia-modelopt[hf]` if working with HF models.
  _warnings.warn(
2025-09-04 19:17:49,276 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
/root/venv310-2/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
[TensorRT-LLM] TensorRT-LLM version: 1.1.0rc2
  - TensorRT-LLM: 1.1.0rc2
  - ONNXRuntime: ?
  - TensorRT: 10.11.0.33
- Any other details that may help: ?
======================================================================

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug in quantizing Deepseek-R1 #281

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug in quantizing Deepseek-R1 #281

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions