Commit d5cd4f3
authored
Qualcomm AI Engine Direct - Static Decoder Runner Support 16bit KV IO (#13127)
### Summary
- Support 16bit KV IO for runner. (Capable to run either 8bit or 16bit)
- Adding README for script to run Qwen2.5 0.5B
- Improving the PPL score for Qwen2.5 0.5B from 18->12.
- Fixing BC CI bug.
Sample Script
`python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s
$DEVICE -m SM8750 --prompt "What is 1+1?" --temperature 0 --model_mode
kv --max_seq_len 1024 --ptq 16a8w --decoder_model qwen2_5
--eval_perplexity --tasks wikitext --limit 1 --artifact
./16bit_qwen_1024 --enable_masked_softmax --r3`
#### Stats with QNN2.37.0 on SM8750
Accuracy: 12ppl (Align with prepare_pt2e and convert_pt2e)
Token Rate: ~130tok/sec, depending on seq_len.
<img width="1658" height="877" alt="image"
src="https://github.com/user-attachments/assets/8fa19068-5613-4329-a527-52f3e02d408f"
/>
### Test plan
Added E2E test to `test_qnn_delegate.py`1 parent 1976647 commit d5cd4f3
File tree
16 files changed
+287
-171
lines changed- backends/qualcomm
- quantizer
- tests
- examples/qualcomm/oss_scripts/llama
- model
- runner
16 files changed
+287
-171
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
161 | | - | |
162 | 161 | | |
163 | 162 | | |
164 | 163 | | |
| |||
337 | 336 | | |
338 | 337 | | |
339 | 338 | | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
344 | 342 | | |
345 | 343 | | |
346 | 344 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4560 | 4560 | | |
4561 | 4561 | | |
4562 | 4562 | | |
| 4563 | + | |
| 4564 | + | |
4563 | 4565 | | |
4564 | 4566 | | |
4565 | 4567 | | |
| |||
4581 | 4583 | | |
4582 | 4584 | | |
4583 | 4585 | | |
4584 | | - | |
| 4586 | + | |
4585 | 4587 | | |
4586 | 4588 | | |
4587 | 4589 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
| 62 | + | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
69 | 75 | | |
70 | 76 | | |
71 | 77 | | |
| |||
120 | 126 | | |
121 | 127 | | |
122 | 128 | | |
123 | | - | |
| 129 | + | |
124 | 130 | | |
125 | 131 | | |
126 | 132 | | |
127 | 133 | | |
128 | 134 | | |
129 | | - | |
| 135 | + | |
130 | 136 | | |
131 | 137 | | |
132 | 138 | | |
133 | 139 | | |
134 | 140 | | |
135 | 141 | | |
136 | 142 | | |
137 | | - | |
| 143 | + | |
138 | 144 | | |
139 | 145 | | |
140 | 146 | | |
| |||
147 | 153 | | |
148 | 154 | | |
149 | 155 | | |
150 | | - | |
| 156 | + | |
151 | 157 | | |
152 | 158 | | |
153 | 159 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
267 | | - | |
268 | 267 | | |
| 268 | + | |
269 | 269 | | |
270 | 270 | | |
271 | 271 | | |
| |||
362 | 362 | | |
363 | 363 | | |
364 | 364 | | |
| 365 | + | |
365 | 366 | | |
366 | 367 | | |
367 | 368 | | |
| |||
535 | 536 | | |
536 | 537 | | |
537 | 538 | | |
538 | | - | |
539 | 539 | | |
540 | 540 | | |
541 | | - | |
| 541 | + | |
| 542 | + | |
542 | 543 | | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
543 | 548 | | |
544 | 549 | | |
545 | 550 | | |
| |||
572 | 577 | | |
573 | 578 | | |
574 | 579 | | |
575 | | - | |
576 | | - | |
577 | | - | |
578 | | - | |
579 | | - | |
580 | | - | |
581 | | - | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
582 | 584 | | |
583 | 585 | | |
584 | 586 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
444 | 444 | | |
445 | 445 | | |
446 | 446 | | |
| 447 | + | |
447 | 448 | | |
448 | 449 | | |
449 | 450 | | |
| |||
607 | 608 | | |
608 | 609 | | |
609 | 610 | | |
| 611 | + | |
610 | 612 | | |
Lines changed: 44 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
149 | 140 | | |
150 | 141 | | |
151 | 142 | | |
152 | 143 | | |
153 | | - | |
| 144 | + | |
| 145 | + | |
154 | 146 | | |
155 | 147 | | |
156 | 148 | | |
| |||
196 | 188 | | |
197 | 189 | | |
198 | 190 | | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
199 | 229 | | |
200 | 230 | | |
0 commit comments