Commit d686472
authored
Qualcomm AI Engine Direct - Static LLM Refactor & Qwen3 1.7B Improvement (pytorch#13755)
### Summary
- Refactor llama.py. The current script has some limitations when it
gets to customizing configs, especially quantization configs. As there
are more models enabled, the script is a little messy, consisting of
multiple `if`-`else` statement deciding what model should go into
specific optimization. We want to move it all the model specs under
`__init__.py`.
- Hiding scale/offset into model's metadata, so args.quant_attrs_path is
no longer required when evaluating ppl score.
- Enable Qwen3 1.7B with 16a4w_block quant. Before is using 16a8w, which
is much slower. Targeting maximizing token rate while ensuring ppl
remains within a 20% margin compared to the FP CPU baseline
#### Stats
token rate = 37tok/sec
ppl = 14.79
### Test plan
Tested all scripts ensuring no regression
cc: @haowhsu-quic1 parent 4d1da11 commit d686472
File tree
8 files changed
+452
-305
lines changed- backends/qualcomm
- quantizer
- tests
- examples/qualcomm/oss_scripts/llama
- runner
8 files changed
+452
-305
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
96 | | - | |
97 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
98 | 101 | | |
99 | 102 | | |
100 | 103 | | |
| |||
163 | 166 | | |
164 | 167 | | |
165 | 168 | | |
166 | | - | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
167 | 188 | | |
168 | 189 | | |
169 | 190 | | |
170 | 191 | | |
| 192 | + | |
171 | 193 | | |
172 | 194 | | |
173 | 195 | | |
| |||
213 | 235 | | |
214 | 236 | | |
215 | 237 | | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | 238 | | |
236 | 239 | | |
237 | 240 | | |
| |||
301 | 304 | | |
302 | 305 | | |
303 | 306 | | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | 307 | | |
311 | 308 | | |
312 | 309 | | |
313 | 310 | | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | 311 | | |
321 | 312 | | |
322 | 313 | | |
| |||
343 | 334 | | |
344 | 335 | | |
345 | 336 | | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | 337 | | |
352 | 338 | | |
353 | 339 | | |
354 | 340 | | |
| 341 | + | |
355 | 342 | | |
356 | 343 | | |
357 | 344 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4582 | 4582 | | |
4583 | 4583 | | |
4584 | 4584 | | |
4585 | | - | |
4586 | | - | |
4587 | 4585 | | |
4588 | 4586 | | |
4589 | 4587 | | |
| |||
4594 | 4592 | | |
4595 | 4593 | | |
4596 | 4594 | | |
4597 | | - | |
4598 | | - | |
4599 | 4595 | | |
4600 | 4596 | | |
4601 | 4597 | | |
| |||
4662 | 4658 | | |
4663 | 4659 | | |
4664 | 4660 | | |
4665 | | - | |
4666 | | - | |
4667 | 4661 | | |
4668 | 4662 | | |
4669 | 4663 | | |
| |||
4740 | 4734 | | |
4741 | 4735 | | |
4742 | 4736 | | |
4743 | | - | |
4744 | | - | |
4745 | 4737 | | |
4746 | 4738 | | |
4747 | 4739 | | |
| |||
4806 | 4798 | | |
4807 | 4799 | | |
4808 | 4800 | | |
4809 | | - | |
4810 | | - | |
4811 | | - | |
4812 | | - | |
4813 | 4801 | | |
4814 | 4802 | | |
4815 | 4803 | | |
4816 | 4804 | | |
4817 | 4805 | | |
4818 | 4806 | | |
4819 | | - | |
4820 | | - | |
4821 | 4807 | | |
4822 | 4808 | | |
4823 | 4809 | | |
| |||
4877 | 4863 | | |
4878 | 4864 | | |
4879 | 4865 | | |
4880 | | - | |
4881 | | - | |
4882 | 4866 | | |
4883 | 4867 | | |
4884 | 4868 | | |
| |||
4890 | 4874 | | |
4891 | 4875 | | |
4892 | 4876 | | |
4893 | | - | |
4894 | | - | |
4895 | 4877 | | |
4896 | 4878 | | |
4897 | 4879 | | |
| |||
4940 | 4922 | | |
4941 | 4923 | | |
4942 | 4924 | | |
4943 | | - | |
4944 | | - | |
4945 | 4925 | | |
4946 | 4926 | | |
4947 | 4927 | | |
| |||
4953 | 4933 | | |
4954 | 4934 | | |
4955 | 4935 | | |
4956 | | - | |
4957 | | - | |
4958 | 4936 | | |
4959 | 4937 | | |
4960 | 4938 | | |
| |||
5003 | 4981 | | |
5004 | 4982 | | |
5005 | 4983 | | |
5006 | | - | |
5007 | | - | |
5008 | 4984 | | |
5009 | 4985 | | |
5010 | 4986 | | |
| |||
0 commit comments