[ROCm][quantization] improve OCP weight quant parser robust #34431

xuebwang-amd · 2026-02-12T13:11:10Z

Purpose

This PR aims to fix an error:

 Traceback (most recent call last):
   File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 754, in worker_main
     worker = WorkerProc(*args, **kwargs)
   File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 580, in __init__
     self.worker.load_model()
   File "/workspace/vllm/vllm/v1/worker/gpu_worker.py", line 294, in load_model
     self.model_runner.load_model(eep_scale_up=eep_scale_up)
   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 4143, in load_model
     self.model = model_loader.load_model(
   File "/workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 54, in load_model
     model = initialize_model(
   File "/workspace/vllm/vllm/model_executor/model_loader/utils.py", line 54, in initialize_model
     model = model_class(vllm_config=vllm_config, prefix=prefix)
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 1210, in __init__
     self.model = self.model_cls(
   File "/workspace/vllm/vllm/compilation/decorators.py", line 305, in __init__
     old_init(self, **kwargs)
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 1067, in __init__
     self.start_layer, self.end_layer, self.layers = make_layers(
   File "/workspace/vllm/vllm/model_executor/models/utils.py", line 707, in make_layers
     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 1069, in <lambda>
     lambda prefix: DeepseekV2DecoderLayer(
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 963, in __init__
     self.mlp = DeepseekV2MoE(
   File "/workspace/vllm/vllm/model_executor/models/deepseek_v2.py", line 298, in __init__
     self.experts = SharedFusedMoE(
   File "/workspace/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 539, in __init__
     quant_config is not None and quant_config.is_mxfp4_quant(prefix, self)
   File "/workspace/vllm/vllm/model_executor/layers/quantization/quark/quark.py", line 396, in is_mxfp4_quant
     self._is_w_ocp_mx_a_x(weight_config, input_config)
   File "/workspace/vllm/vllm/model_executor/layers/quantization/quark/quark.py", line 348, in _is_w_ocp_mx_a_x
     if weight_quant.get("qscheme") != "per_group":
 AttributeError: 'list' object has no attribute 'get'

Model: amd/Kimi-K2-Thinking-W4A8

Test Plan

Test Result

Signed-off-by: xuebwang-amd <[email protected]>

xuebwang-amd · 2026-02-12T13:12:03Z

@BowenBao @tjtanaa

gemini-code-assist

Code Review

This PR fixes a crash in the OCP weight quantization parser by adding a type check. The change is correct but only partially addresses the underlying issue, as the same type of error can occur in other functions. I've added a comment with a recommendation for a more robust fix.

gemini-code-assist · 2026-02-12T13:14:29Z

vllm/model_executor/layers/quantization/quark/quark.py

+        if isinstance(weight_quant, list):
+            logger.debug(
+                "Quark model's weight quantization is incompatible with OCP_MX format: "
+                "weight_quant is a list (e.g. fp8_w4a8), OCP_MX requires a single dict."
+            )
+            return False


This check correctly prevents the reported crash when weight_quant is a list. However, the underlying issue can also cause crashes in _is_fp8_w8a8 and _is_static_tensor_w8a8, which are called earlier in _get_scheme_from_config.

The crash likely occurred in this function for the specific model because input_quant was None, allowing the previous checks to pass without error.

To create a more robust fix, I suggest adding similar isinstance(weight_quant, list) checks to _is_fp8_w8a8 and _is_static_tensor_w8a8 as well.

main fix

9ba8e19

Signed-off-by: xuebwang-amd <[email protected]>

xuebwang-amd requested a review from tjtanaa as a code owner February 12, 2026 13:11

mergify bot added the rocm Related to AMD ROCm label Feb 12, 2026

github-project-automation bot added this to AMD Feb 12, 2026

github-project-automation bot moved this to Todo in AMD Feb 12, 2026

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

tjtanaa approved these changes Feb 12, 2026

View reviewed changes

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 12, 2026

Merge branch 'main' into xuebin_fix_quark_ocp_weight_parser

be81a9e

tjtanaa enabled auto-merge (squash) February 12, 2026 15:47

BowenBao approved these changes Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][quantization] improve OCP weight quant parser robust #34431

[ROCm][quantization] improve OCP weight quant parser robust #34431

Uh oh!

xuebwang-amd commented Feb 12, 2026 •

edited by github-actions bot

Loading

Uh oh!

xuebwang-amd commented Feb 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[ROCm][quantization] improve OCP weight quant parser robust #34431

Are you sure you want to change the base?

[ROCm][quantization] improve OCP weight quant parser robust #34431

Uh oh!

Conversation

xuebwang-amd commented Feb 12, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

xuebwang-amd commented Feb 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xuebwang-amd commented Feb 12, 2026 •

edited by github-actions bot

Loading