Commit 8b7536b
IQ1_S_R4: better 1.5 bpw quants (#185)
* iq1_s_r4: basics - quantize/dequantize
* iq1_s_r4: gemm/gemv works on AVX2/Zen4
* Don't forget to make sure we have a multiple of 4 rows per thread
* iq1_s_r4: this is better
* iq1_s_r4: fix Zen4 after AVX2 changes
* iq1_s_r4: NEON gemm/gemv
* iq1_s_r4: more bits for shared experts
With this mix we arrive at PPL(512) = 9.4140
for Deepseek-Lite using 1.766 bpw for the repeating layers.
On the Ryzen-7950X we get PP-512 = 494 t/s and
TG-128 = 52 t/s @ 16 threads.
* Forgotten counter increment
* iq1_s_r4: slightly faster AVX2/Zen4 gemm/gemv
* Compiler warnings
---------
Co-authored-by: Iwan Kawrakow <[email protected]>1 parent ecf111a commit 8b7536b
File tree
11 files changed
+1104
-93
lines changed- examples/quantize
- ggml
- include
- src
- iqk
- include
- src
11 files changed
+1104
-93
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
510 | 511 | | |
511 | 512 | | |
512 | 513 | | |
| 514 | + | |
513 | 515 | | |
514 | 516 | | |
515 | 517 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
427 | 427 | | |
428 | 428 | | |
429 | 429 | | |
| 430 | + | |
430 | 431 | | |
431 | 432 | | |
432 | 433 | | |
| |||
510 | 511 | | |
511 | 512 | | |
512 | 513 | | |
| 514 | + | |
513 | 515 | | |
514 | 516 | | |
515 | 517 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
485 | 485 | | |
486 | 486 | | |
487 | 487 | | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
488 | 494 | | |
489 | 495 | | |
490 | 496 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
66 | 67 | | |
67 | 68 | | |
68 | 69 | | |
| 70 | + | |
69 | 71 | | |
70 | 72 | | |
71 | 73 | | |
| |||
148 | 150 | | |
149 | 151 | | |
150 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
151 | 156 | | |
152 | 157 | | |
153 | 158 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1176 | 1176 | | |
1177 | 1177 | | |
1178 | 1178 | | |
1179 | | - | |
1180 | | - | |
| 1179 | + | |
| 1180 | + | |
1181 | 1181 | | |
1182 | 1182 | | |
1183 | 1183 | | |
1184 | 1184 | | |
1185 | 1185 | | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
1186 | 1199 | | |
1187 | 1200 | | |
1188 | 1201 | | |
| |||
4387 | 4400 | | |
4388 | 4401 | | |
4389 | 4402 | | |
| 4403 | + | |
4390 | 4404 | | |
4391 | 4405 | | |
4392 | 4406 | | |
| |||
10934 | 10948 | | |
10935 | 10949 | | |
10936 | 10950 | | |
| 10951 | + | |
10937 | 10952 | | |
10938 | 10953 | | |
10939 | 10954 | | |
| |||
11402 | 11417 | | |
11403 | 11418 | | |
11404 | 11419 | | |
| 11420 | + | |
11405 | 11421 | | |
11406 | 11422 | | |
11407 | 11423 | | |
| |||
11567 | 11583 | | |
11568 | 11584 | | |
11569 | 11585 | | |
| 11586 | + | |
11570 | 11587 | | |
11571 | 11588 | | |
11572 | 11589 | | |
| |||
14805 | 14822 | | |
14806 | 14823 | | |
14807 | 14824 | | |
| 14825 | + | |
14808 | 14826 | | |
14809 | 14827 | | |
14810 | 14828 | | |
| |||
15210 | 15228 | | |
15211 | 15229 | | |
15212 | 15230 | | |
| 15231 | + | |
15213 | 15232 | | |
15214 | 15233 | | |
15215 | 15234 | | |
| |||
15509 | 15528 | | |
15510 | 15529 | | |
15511 | 15530 | | |
| 15531 | + | |
15512 | 15532 | | |
15513 | 15533 | | |
15514 | 15534 | | |
| |||
16137 | 16157 | | |
16138 | 16158 | | |
16139 | 16159 | | |
| 16160 | + | |
16140 | 16161 | | |
16141 | 16162 | | |
16142 | 16163 | | |
| |||
22893 | 22914 | | |
22894 | 22915 | | |
22895 | 22916 | | |
| 22917 | + | |
22896 | 22918 | | |
22897 | 22919 | | |
22898 | 22920 | | |
| |||
22975 | 22997 | | |
22976 | 22998 | | |
22977 | 22999 | | |
| 23000 | + | |
22978 | 23001 | | |
22979 | 23002 | | |
22980 | 23003 | | |
| |||
0 commit comments