Skip to content

Commit 9bcacc5

Browse files
authored
Fix logic to identify MLP blocks that need to be fused (#1847)
SUMMARY: This PR fixes the logic used to determine if up_proj and gate_proj should be merged during quantization. It checked whether up_proj OR gate_proj were present whereas in fact you need BOTH to be present for the fusion to occur. This bug was blocking the quantization of the Apertus models since they don't have the gate_proj modules. The broken logic was flagging the MLP layers for fusion but then the code broke because it couldn't find gate_proj. TEST PLAN: Successfully quantized the Apertus models: https://huggingface.co/RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic https://huggingface.co/RedHatAI/Apertus-70B-Instruct-2509-FP8-dynamic Signed-off-by: Alexandre Marques <[email protected]>
1 parent d6548a8 commit 9bcacc5

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/llmcompressor/modifiers/utils/helpers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ def _is_attention_module(module: Module):
3838

3939
def _is_mlp_module(module: Module):
4040
return "mlp" in module.__class__.__name__.lower() and (
41-
hasattr(module, "gate_proj") or hasattr(module, "up_proj")
41+
hasattr(module, "gate_proj") and hasattr(module, "up_proj")
4242
)
4343

4444
def _valid_tensor_group_quant(layer_list: List[Linear]):

0 commit comments

Comments
 (0)