Enable quantization for bf16 model (#14558)

navsud · facebook-github-bot · commit 9a886d2dc4cc · 2025-09-25T15:17:11.000-07:00
Summary:

To save GPU memory `bfloat16` dtype is commonly used for training of LLMs. Currently, the quantizer ignores quantizing the nodes if they are not float32. This change enables quantization of bf16 nodes as well.

Reviewed By: billmguo

Differential Revision: D82866443
diff --git a/backends/qualcomm/quantizer/annotators.py b/backends/qualcomm/quantizer/annotators.py
@@ -68,7 +68,7 @@ def _is_float_tensor(node: Node):
         or not isinstance(node.meta["val"], FakeTensor)
     ):
         return False
-    return node.meta["val"].dtype == torch.float32
+    return node.meta["val"].dtype in (torch.bfloat16, torch.float32)
 
 
 def _mark_nodes_as_annotated(nodes: List[Node]):