Skip to content

[Bug]: Qwen3.5-397B-A17B int4 vllm: KeyError: 'layers.0.mlp.experts.w2_weight.0.qweight' #1464

@XuehaoSun

Description

@XuehaoSun

Reproduction Steps

ENV:

uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly

command:

vllm serve INC4AI/Qwen3.5-397B-A17B-int4-mixed-AutoRound --port 7777 --host localhost --trust-remote-code --dtype bfloat16 --tensor_parallel_size 4 --max-model-len 4096 --max-num-seqs 64 --gpu-memory-utilization 0.8 --reasoning-parser qwen3 --enable-prefix-caching --language-model-only

use model: https://huggingface.co/INC4AI/Qwen3.5-397B-A17B-int4-mixed-AutoRound

Problem Description

[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] Traceback (most recent call last):
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 754, in worker_main
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     worker = WorkerProc(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     return func(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]            ^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 580, in __init__
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     self.worker.load_model()
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 324, in load_model
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     return func(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]            ^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4197, in load_model
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     self.model = model_loader.load_model(
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]                  ^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     return func(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]            ^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     self.load_weights(model, model_config)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     return func(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]            ^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 747, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     return original_load_weights(self, weights, *args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 344, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     autoloaded_weights = set(self._load_module("", self.module, weights))
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 292, in _load_module
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     yield from self._load_module(
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 265, in _load_module
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     loaded_params = module_load_weights(weights)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 604, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     return loader.load_weights(weights)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     return original_load_weights(self, weights, *args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 344, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     autoloaded_weights = set(self._load_module("", self.module, weights))
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 292, in _load_module
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     yield from self._load_module(
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 265, in _load_module
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     loaded_params = module_load_weights(weights)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 465, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     success = self.load_fused_expert_weights(
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 348, in load_fused_expert_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]     param = params_dict[name]
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783]             ~~~~~~~~~~~^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] KeyError: 'layers.0.mlp.experts.w2_weight.0.qweight'

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions