You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix logic to identify MLP blocks that need to be fused (#1847)
SUMMARY:
This PR fixes the logic used to determine if up_proj and gate_proj
should be merged during quantization. It checked whether up_proj OR
gate_proj were present whereas in fact you need BOTH to be present for
the fusion to occur.
This bug was blocking the quantization of the Apertus models since they
don't have the gate_proj modules. The broken logic was flagging the MLP
layers for fusion but then the code broke because it couldn't find
gate_proj.
TEST PLAN:
Successfully quantized the Apertus models:
https://huggingface.co/RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamichttps://huggingface.co/RedHatAI/Apertus-70B-Instruct-2509-FP8-dynamic
Signed-off-by: Alexandre Marques <[email protected]>
0 commit comments