Commit 2c1a355
Fix xnnpack quantization discrepancy for non-fp32 (#8488)
Summary:
Perform quantization on the weights expressed in their original dtype (from the checkpoint) by passing in the checkpoint dtype to the quantization source transformation and modifying the computation dtype (the result dtype of the dequant, the dtype that the ops are actually computed in) to the dtype override. We must do it this way since the checkpoint and computation dtype are coupled into a single `precision` parameter in the torchao api, and that is something that we cannot change.
Note - no need to worry about https://github.com/pytorch/ao/blob/main/torchao/quantization/GPTQ.py#L1168, precision is passed in with the checkpoint dtype
### Comparison of arbitrary q_proj tensor from sample Llama checkpoint:
Before:
```
Mismatched elements: 3260378 / 4194304 (77.7%)
Greatest absolute difference: 0.08802086114883423 at index (1129, 604) (up to 1e-05 allowed)
Greatest relative difference: 1.0 at index (0, 1350) (up to 1.3e-06 allowed)
Signal-to-noise: 32.8974 dB
```
After: no difference
Test Plan:
### Manual testing
```
python -m examples.models.llama.export_llama \
-v -c xl_consolidated/consolidated_renamed.pth \
-p xl_consolidated/et_params.json -kv -d fp32 \
-qmode 8da4w --group_size 32 -X \
--use_sdpa_with_kv_cache \
--output_name quantized_baseline.pte \
--max_context_length 4096 -E 4,32
```
With the following inserted after the quantization:
```
edge_manager.model(
torch.tensor([[2, 3, 4]], dtype=torch.long),
{"input_pos": torch.tensor([0], dtype=torch.long)},
)
```
And the following modifications to GPTQ.py in torchao: pytorch/ao#1756 for testing.
### Automated testing
+ existing CI tests
### Regression testing
TBD
Reviewed By: kimishpatel
Differential Revision: D70184325
Pulled By: jackzhxng1 parent d16b867 commit 2c1a355
File tree
6 files changed
+159
-28
lines changed- examples/models
- llama
- source_transformation
- llava
- exir/tests
- extension/llm/export
6 files changed
+159
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
| 13 | + | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
| |||
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
55 | | - | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
594 | 595 | | |
595 | 596 | | |
596 | 597 | | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
597 | 616 | | |
598 | | - | |
599 | | - | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
600 | 628 | | |
601 | 629 | | |
602 | 630 | | |
| |||
784 | 812 | | |
785 | 813 | | |
786 | 814 | | |
787 | | - | |
788 | | - | |
789 | 815 | | |
790 | 816 | | |
791 | 817 | | |
| |||
1069 | 1095 | | |
1070 | 1096 | | |
1071 | 1097 | | |
1072 | | - | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
1073 | 1103 | | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
1074 | 1123 | | |
1075 | 1124 | | |
1076 | 1125 | | |
| |||
1103 | 1152 | | |
1104 | 1153 | | |
1105 | 1154 | | |
1106 | | - | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
1107 | 1160 | | |
1108 | 1161 | | |
1109 | 1162 | | |
| |||
1117 | 1170 | | |
1118 | 1171 | | |
1119 | 1172 | | |
1120 | | - | |
| 1173 | + | |
1121 | 1174 | | |
1122 | 1175 | | |
1123 | 1176 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
39 | | - | |
| 40 | + | |
| 41 | + | |
40 | 42 | | |
41 | 43 | | |
42 | 44 | | |
| |||
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
| 57 | + | |
55 | 58 | | |
56 | | - | |
57 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
58 | 66 | | |
59 | 67 | | |
60 | 68 | | |
61 | | - | |
62 | | - | |
| 69 | + | |
| 70 | + | |
63 | 71 | | |
64 | | - | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
65 | 78 | | |
66 | 79 | | |
67 | 80 | | |
68 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
69 | 84 | | |
70 | 85 | | |
71 | 86 | | |
| |||
75 | 90 | | |
76 | 91 | | |
77 | 92 | | |
| 93 | + | |
| 94 | + | |
78 | 95 | | |
79 | 96 | | |
80 | 97 | | |
81 | | - | |
| 98 | + | |
82 | 99 | | |
83 | 100 | | |
84 | 101 | | |
| |||
101 | 118 | | |
102 | 119 | | |
103 | 120 | | |
| 121 | + | |
| 122 | + | |
104 | 123 | | |
105 | 124 | | |
106 | 125 | | |
| |||
121 | 140 | | |
122 | 141 | | |
123 | 142 | | |
| 143 | + | |
124 | 144 | | |
125 | | - | |
| 145 | + | |
126 | 146 | | |
| 147 | + | |
| 148 | + | |
127 | 149 | | |
128 | 150 | | |
129 | 151 | | |
| |||
177 | 199 | | |
178 | 200 | | |
179 | 201 | | |
180 | | - | |
| 202 | + | |
181 | 203 | | |
182 | 204 | | |
183 | 205 | | |
| |||
190 | 212 | | |
191 | 213 | | |
192 | 214 | | |
| 215 | + | |
193 | 216 | | |
194 | | - | |
| 217 | + | |
195 | 218 | | |
| 219 | + | |
| 220 | + | |
196 | 221 | | |
197 | 222 | | |
198 | 223 | | |
| |||
348 | 373 | | |
349 | 374 | | |
350 | 375 | | |
| 376 | + | |
351 | 377 | | |
352 | 378 | | |
353 | 379 | | |
| |||
356 | 382 | | |
357 | 383 | | |
358 | 384 | | |
| 385 | + | |
359 | 386 | | |
360 | 387 | | |
361 | 388 | | |
| |||
391 | 418 | | |
392 | 419 | | |
393 | 420 | | |
394 | | - | |
| 421 | + | |
395 | 422 | | |
396 | 423 | | |
397 | 424 | | |
| |||
576 | 603 | | |
577 | 604 | | |
578 | 605 | | |
| 606 | + | |
579 | 607 | | |
580 | 608 | | |
581 | 609 | | |
| |||
584 | 612 | | |
585 | 613 | | |
586 | 614 | | |
| 615 | + | |
| 616 | + | |
587 | 617 | | |
588 | 618 | | |
589 | 619 | | |
| |||
614 | 644 | | |
615 | 645 | | |
616 | 646 | | |
617 | | - | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
618 | 652 | | |
619 | 653 | | |
620 | 654 | | |
| |||
750 | 784 | | |
751 | 785 | | |
752 | 786 | | |
753 | | - | |
| 787 | + | |
754 | 788 | | |
755 | 789 | | |
756 | 790 | | |
| |||
775 | 809 | | |
776 | 810 | | |
777 | 811 | | |
| 812 | + | |
778 | 813 | | |
779 | 814 | | |
780 | 815 | | |
781 | 816 | | |
782 | 817 | | |
| 818 | + | |
783 | 819 | | |
784 | 820 | | |
785 | 821 | | |
786 | | - | |
787 | | - | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
788 | 828 | | |
789 | 829 | | |
790 | 830 | | |
| |||
802 | 842 | | |
803 | 843 | | |
804 | 844 | | |
805 | | - | |
| 845 | + | |
| 846 | + | |
806 | 847 | | |
807 | 848 | | |
808 | 849 | | |
| |||
829 | 870 | | |
830 | 871 | | |
831 | 872 | | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
832 | 897 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | | - | |
| 103 | + | |
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
708 | 708 | | |
709 | 709 | | |
710 | 710 | | |
711 | | - | |
| 711 | + | |
712 | 712 | | |
713 | 713 | | |
714 | | - | |
| 714 | + | |
715 | 715 | | |
716 | 716 | | |
717 | 717 | | |
| |||
0 commit comments