Commit fee1b2d
authored
Quantize lora linears (#15935)
### Summary
LoraLinears contain:
1. base weight (nn.Linear)
2. lora_a (nn.Linear)
3. lora_b (nn.Linear)
(2) and (3) are caught by the filter, but (1) is not, as the weight and
bias are pulled out of the nn.Linear and placed into nn.Parameters, and
the linear is performed manually. This is for checkpoint compatibility -
otherwise we'd have to map the weights for any lora model.
See:
https://github.com/pytorch/executorch/blob/b4d72f1e271915e9c0e1d313753a1eec840fbdee/examples/models/llama/lora.py#L31-L37
This PR adds lora linears into the quantization filter.
### Test plan
```
python -m extension.llm.export.export_llm \
base.checkpoint="${DOWNLOADED_PATH}/consolidated.00.pth" \
base.params="${DOWNLOADED_PATH}/params.json" \
base.adapter_checkpoint="../et_docs_7_epoch/adapter_model.safetensors" \
base.adapter_config="../et_docs_7_epoch/adapter_config.json" \
base.tokenizer_path="../et_docs_7_epoch/" \
model.use_kv_cache=true \
model.use_sdpa_with_kv_cache=true \
```
Confirm output model size is ~1.7GB instead of 5.1GB.
```
(executorch) [[email protected] /data/users/lfq/executorch (lfq.quantize-lora-linears)]$ ls -la *.pte
-rw-r--r-- 1 lfq users 5106135168 Nov 20 15:59 et_lora.pte
-rw-r--r-- 1 lfq users 1733835776 Nov 20 17:07 et_lora_fix.pte
```1 parent a4298ac commit fee1b2d
1 file changed
+16
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| 162 | + | |
162 | 163 | | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
163 | 175 | | |
164 | | - | |
| 176 | + | |
165 | 177 | | |
166 | 178 | | |
167 | 179 | | |
168 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
169 | 183 | | |
170 | 184 | | |
171 | 185 | | |
| |||
0 commit comments