Quantize lora linears (#15935)

lucylq · web-flow · commit fee1b2db6a0e · 2025-11-21T09:33:00.000-08:00
### Summary LoraLinears contain: 1. base weight (nn.Linear) 2. lora_a (nn.Linear) 3. lora_b (nn.Linear) (2) and (3) are caught by the filter, but (1) is not, as the weight and bias are pulled out of the nn.Linear and placed into nn.Parameters, and the linear is performed manually. This is for checkpoint compatibility - otherwise we'd have to map the weights for any lora model. See: https://github.com/pytorch/executorch/blob/b4d72f1e271915e9c0e1d313753a1eec840fbdee/examples/models/llama/lora.py#L31-L37 This PR adds lora linears into the quantization filter. ### Test plan ``` python -m extension.llm.export.export_llm \ base.checkpoint="${DOWNLOADED_PATH}/consolidated.00.pth" \ base.params="${DOWNLOADED_PATH}/params.json" \ base.adapter_checkpoint="../et_docs_7_epoch/adapter_model.safetensors" \ base.adapter_config="../et_docs_7_epoch/adapter_config.json" \ base.tokenizer_path="../et_docs_7_epoch/" \ model.use_kv_cache=true \ model.use_sdpa_with_kv_cache=true \ ``` Confirm output model size is ~1.7GB instead of 5.1GB. ``` (executorch) [lfq@devvm311.ldc0 /data/users/lfq/executorch (lfq.quantize-lora-linears)]$ ls -la *.pte -rw-r--r-- 1 lfq users 5106135168 Nov 20 15:59 et_lora.pte -rw-r--r-- 1 lfq users 1733835776 Nov 20 17:07 et_lora_fix.pte ```
diff --git a/examples/models/llama/source_transformation/quantize.py b/examples/models/llama/source_transformation/quantize.py
@@ -159,13 +159,27 @@ def quantize(  # noqa C901
         from torchao.utils import unwrap_tensor_subclass
 
         def filter_fn(m, fqn):
+            # Check if it's a regular nn.Linear
             is_linear = isinstance(m, nn.Linear)
+
+            # Check if it's a LoRALinear (which has a base weight parameter to quantize)
+            is_lora_linear = False
+            try:
+                from executorch.examples.models.llama.lora import LoRALinear
+
+                is_lora_linear = isinstance(m, LoRALinear)
+            except ImportError:
+                pass
+
+            # Check if the weight shape is compatible with group size
             has_shape_compatible_with_group_size = False
-            if is_linear:
+            if is_linear or is_lora_linear:
                 has_shape_compatible_with_group_size = (
                     m.weight.shape[1] % group_size == 0
                 )
-            return is_linear and has_shape_compatible_with_group_size
+            return (
+                is_linear or is_lora_linear
+            ) and has_shape_compatible_with_group_size
 
         quantize_(
             model,