Skip to content

Conversation

@xuebwang-amd
Copy link
Contributor

@xuebwang-amd xuebwang-amd commented Feb 12, 2026

Purpose

This PR aims to fix an error:

 Traceback (most recent call last):
   File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 754, in worker_main
     worker = WorkerProc(*args, **kwargs)
   File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 580, in __init__
     self.worker.load_model()
   File "/workspace/vllm/vllm/v1/worker/gpu_worker.py", line 294, in load_model
     self.model_runner.load_model(eep_scale_up=eep_scale_up)
   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 4143, in load_model
     self.model = model_loader.load_model(
   File "/workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 54, in load_model
     model = initialize_model(
   File "/workspace/vllm/vllm/model_executor/model_loader/utils.py", line 54, in initialize_model
     model = model_class(vllm_config=vllm_config, prefix=prefix)
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 1210, in __init__
     self.model = self.model_cls(
   File "/workspace/vllm/vllm/compilation/decorators.py", line 305, in __init__
     old_init(self, **kwargs)
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 1067, in __init__
     self.start_layer, self.end_layer, self.layers = make_layers(
   File "/workspace/vllm/vllm/model_executor/models/utils.py", line 707, in make_layers
     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 1069, in <lambda>
     lambda prefix: DeepseekV2DecoderLayer(
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 963, in __init__
     self.mlp = DeepseekV2MoE(
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 298, in __init__
     self.experts = SharedFusedMoE(
   File "/workspace/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 539, in __init__
     quant_config is not None and quant_config.is_mxfp4_quant(prefix, self)
   File "/workspace/vllm/vllm/model_executor/layers/quantization/quark/quark.py", line 396, in is_mxfp4_quant
     self._is_w_ocp_mx_a_x(weight_config, input_config)
   File "/workspace/vllm/vllm/model_executor/layers/quantization/quark/quark.py", line 348, in _is_w_ocp_mx_a_x
     if weight_quant.get("qscheme") != "per_group":
 AttributeError: 'list' object has no attribute 'get'

Model: amd/Kimi-K2-Thinking-W4A8

Test Plan

Test Result

Signed-off-by: xuebwang-amd <[email protected]>
@mergify mergify bot added the rocm Related to AMD ROCm label Feb 12, 2026
@xuebwang-amd
Copy link
Contributor Author

@BowenBao @tjtanaa

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR fixes a crash in the OCP weight quantization parser by adding a type check. The change is correct but only partially addresses the underlying issue, as the same type of error can occur in other functions. I've added a comment with a recommendation for a more robust fix.

Comment on lines +340 to +345
if isinstance(weight_quant, list):
logger.debug(
"Quark model's weight quantization is incompatible with OCP_MX format: "
"weight_quant is a list (e.g. fp8_w4a8), OCP_MX requires a single dict."
)
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This check correctly prevents the reported crash when weight_quant is a list. However, the underlying issue can also cause crashes in _is_fp8_w8a8 and _is_static_tensor_w8a8, which are called earlier in _get_scheme_from_config.

The crash likely occurred in this function for the specific model because input_quant was None, allowing the previous checks to pass without error.

To create a more robust fix, I suggest adding similar isinstance(weight_quant, list) checks to _is_fp8_w8a8 and _is_static_tensor_w8a8 as well.

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 12, 2026
@tjtanaa tjtanaa enabled auto-merge (squash) February 12, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants