Skip to content

Commit 21802c4

Browse files
authored
[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (#16031)
Signed-off-by: mgoin <[email protected]>
1 parent 652907b commit 21802c4

File tree

1 file changed

+3
-1
lines changed
  • vllm/model_executor/layers/quantization

1 file changed

+3
-1
lines changed

vllm/model_executor/layers/quantization/fp8.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,9 @@ def get_quant_method(self, layer: torch.nn.Module,
116116
from vllm.attention.layer import Attention # Avoid circular import
117117

118118
if isinstance(layer, LinearBase):
119-
if is_layer_skipped(prefix, self.ignored_layers):
119+
if is_layer_skipped(prefix=prefix,
120+
ignored_layers=self.ignored_layers,
121+
fused_mapping=self.packed_modules_mapping):
120122
return UnquantizedLinearMethod()
121123
return Fp8LinearMethod(self)
122124
elif isinstance(layer, FusedMoE):

0 commit comments

Comments
 (0)