Fix logic to identify MLP blocks that need to be fused (#1847)

anmarques · web-flow · commit 9bcacc54bcf2 · 2025-09-19T20:55:19.000Z
SUMMARY: This PR fixes the logic used to determine if up_proj and gate_proj should be merged during quantization. It checked whether up_proj OR gate_proj were present whereas in fact you need BOTH to be present for the fusion to occur. This bug was blocking the quantization of the Apertus models since they don't have the gate_proj modules. The broken logic was flagging the MLP layers for fusion but then the code broke because it couldn't find gate_proj. TEST PLAN: Successfully quantized the Apertus models: https://huggingface.co/RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic https://huggingface.co/RedHatAI/Apertus-70B-Instruct-2509-FP8-dynamic Signed-off-by: Alexandre Marques <almarque@redhat.com>
diff --git a/src/llmcompressor/modifiers/utils/helpers.py b/src/llmcompressor/modifiers/utils/helpers.py
@@ -38,7 +38,7 @@ def _is_attention_module(module: Module):
 
     def _is_mlp_module(module: Module):
         return "mlp" in module.__class__.__name__.lower() and (
-            hasattr(module, "gate_proj") or hasattr(module, "up_proj")
+            hasattr(module, "gate_proj") and hasattr(module, "up_proj")
         )
 
     def _valid_tensor_group_quant(layer_list: List[Linear]):

Original file line number	Diff line number	Diff line change
`@@ -38,7 +38,7 @@ def _is_attention_module(module: Module):`
`38`	`38`
`39`	`39`	`def _is_mlp_module(module: Module):`
`40`	`40`	`return "mlp" in module.__class__.__name__.lower() and (`
`41`		`- hasattr(module, "gate_proj") or hasattr(module, "up_proj")`
	`41`	`+ hasattr(module, "gate_proj") and hasattr(module, "up_proj")`
`42`	`42`	`)`
`43`	`43`
`44`	`44`	`def _valid_tensor_group_quant(layer_list: List[Linear]):`