Skip to content

Commit 6b04039

Browse files
authored
[BugFix] Skip the Q component for QKVParallelLinear in the case of QKVCrossParallelLinear since its width is 0 (#22369)
Signed-off-by: sstamenk <[email protected]>
1 parent 1c859a1 commit 6b04039

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

vllm/model_executor/layers/quantization/utils/w8a8_utils.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,9 @@ def requantize_with_max_scale(
121121
if unfused_module_in_checkpoint:
122122
start = 0
123123
for idx, logical_width in enumerate(logical_widths):
124+
# Skip any component with zero width.
125+
if logical_width == 0:
126+
continue
124127
end = start + logical_width
125128
weight_dq = per_tensor_dequantize(weight[start:end, :],
126129
weight_scale[idx])

0 commit comments

Comments
 (0)