Enable 16-bit activations in Cadence Quantizer For fully_connected and linear

RahulC7 · facebook-github-bot · commit ee7a0f752610 · 2025-10-10T11:34:18.000-07:00
Summary:
# Context
We currently only support 8-bit for most operators. We would like to add generic ops for 16-bit activations, for the following ops:
- quantized_fully_connected
- quantized_linear
- quantized_conv (all flavors)
- quantized_matmul



# This Diff


Here, we add support for `quantized_linear` and `quantized_fully_connected`. We need to do the following: 
1. Allow 16-bit activations in `quantized_fully_connected_out.cpp` and `quantized_linear_out.cpp`. 
2. Allow 16-bit activations in `ref_implementations.py`, so tests can run with 16-bit activations to validate the quantization is correct. 
3. Add a quantizer(`CadenceWith16BitLinearActivationsQuantizer`) for checking this works and create a unit test.

Differential Revision: D84284794
diff --git a/backends/cadence/aot/quantizer/quantizer.py b/backends/cadence/aot/quantizer/quantizer.py
@@ -338,3 +338,13 @@ def __init__(self, quantizers: Optional[list[Quantizer]] = None) -> None:
             quantizers = get_cadence_default_quantizers()
         quantizers.append(CadenceAtenQuantizer(SoftmaxPattern(), qconfig_A16))
         super().__init__(quantizers)
+
+class CadenceWith16BitLinearActivationsQuantizer(CadenceQuantizer):
+    """
+    Quantizer with 16-bit activations for specific operations
+    """
+    def __init__(self, quantizers: Optional[list[Quantizer]] = None) -> None:
+        quantizers = []
+        # Add 16-bit quantizers for LinearPattern
+        quantizers.append(CadenceAtenQuantizer(LinearPattern(), qconfig_A16))
+        super().__init__(quantizers)
diff --git a/backends/cadence/aot/ref_implementations.py b/backends/cadence/aot/ref_implementations.py
@@ -261,7 +261,7 @@ def quantized_linear_common(
     src = src.view(-1, K)
 
     dtype = src.dtype
-    supported_dtypes = [torch.int8, torch.uint8, torch.int32]
+    supported_dtypes = [torch.int8, torch.uint8, torch.int16, torch.int32]
     if dtype not in supported_dtypes:
         raise ValueError(
             f"Unsupported dtype to quantize to. Supported dtypes must be one of {supported_dtypes}"