Commit d85c644
New IQ2_KT, IQ3_KT and IQ4_KT, V2 (#529)
* New iq4_kt trellis
The new trellis generates int8_t values via
sum_as_uint8_t[(ka * idx + kb) & 0x3f33f3f3f] - 126.
CUDA dequantize works.
AVX2 case Ny > 32 works, and we get 273 t/s for L3-8B.
PPL is on par or even slightly lower than original QTIP trellis.
* Something is not working with the AVX2 dot product
* New iq4_kt: CUDA MMVQ
* New iq4_kt: CUDA MMQ
* For now have only iq4_kt use the new trellis
* Fix iq2_kt that got broken along the way
* New iq4_kt: AVX2 dot product finally works
We get 13.6 t/s vs 8.4 t/s with the f16 trellis and f32 arithmetic.
Still somewhat slower than other quants, but no longer pathetic.
* New iq4_kt: fix vanilla AVX2
* New iq4_kt: NEON implementation
We get very respectable PP-512 = 120 t/s.
TG-128 is pathetic at 5.3 t/s, so 20+% slower than the f16 variant.
* New iq4_kt: slightly faster NEON
* New iq4_kt: slightly faster NEON
* New iq4_kt: faster NEON
We are now at 9.4 t/s, up from 6.6 t/s for the f16 trellis.
* Minor
* New iq4_kt trellis: not working Metal implementation
* Remove the extra 4 bytes of row meta data that is no longer used
* Cleanup
* Adding forgottent file
* Switching iq2_kt to new trellis - CUDA MMQ
* New iq2_kt: CUDA GEMV
* New iq2_kt: AVX2 dequantize
* New iq2_kt: AVX2 GEMM/GEMV
* Adding forgotten file
* New iq2_kt: NEON GEMM/GEMV
* New iq2_kt: slightly faster NEON GEMM
* New iq2_kt: Metal - very slow.
It seems Apple Silicon cannot quickly add 4 8-bit ints.
Or I don't know how to do it - but I didn't find anything
in the Metal Shading Language Specification.
So, performance is quite a bit worse than the original trellis.
* Add missing break
* Trying @louiehelm's multiplier
* CPU
* iq3_kt: use integer trellis + CUDA dequantize and MMVQ
* iq3_kt: MMQ
* iq3_kt: AVX2 GEMM
* iq3_kt: AVX2 GEMV
* The trellis quants now need super-blocks of 256, so we need a check
---------
Co-authored-by: Iwan Kawrakow <[email protected]>1 parent c410cc7 commit d85c644
File tree
16 files changed
+1668
-132
lines changed- ggml/src
- ggml-cuda
- template-instances
- iqk
- src
16 files changed
+1668
-132
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
578 | 578 | | |
579 | 579 | | |
580 | 580 | | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
581 | 595 | | |
582 | 596 | | |
583 | 597 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
340 | 340 | | |
341 | 341 | | |
342 | 342 | | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
343 | 349 | | |
344 | 350 | | |
345 | 351 | | |
| |||
367 | 373 | | |
368 | 374 | | |
369 | 375 | | |
370 | | - | |
| 376 | + | |
371 | 377 | | |
372 | | - | |
| 378 | + | |
373 | 379 | | |
374 | 380 | | |
375 | 381 | | |
| |||
388 | 394 | | |
389 | 395 | | |
390 | 396 | | |
391 | | - | |
| 397 | + | |
392 | 398 | | |
393 | 399 | | |
394 | | - | |
| 400 | + | |
395 | 401 | | |
396 | 402 | | |
397 | 403 | | |
| |||
401 | 407 | | |
402 | 408 | | |
403 | 409 | | |
404 | | - | |
405 | | - | |
406 | | - | |
| 410 | + | |
| 411 | + | |
407 | 412 | | |
408 | 413 | | |
409 | 414 | | |
| |||
423 | 428 | | |
424 | 429 | | |
425 | 430 | | |
426 | | - | |
427 | | - | |
| 431 | + | |
| 432 | + | |
428 | 433 | | |
429 | 434 | | |
430 | 435 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
433 | 433 | | |
434 | 434 | | |
435 | 435 | | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
436 | 549 | | |
437 | 550 | | |
438 | 551 | | |
| |||
1217 | 1330 | | |
1218 | 1331 | | |
1219 | 1332 | | |
| 1333 | + | |
| 1334 | + | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
1220 | 1357 | | |
1221 | 1358 | | |
1222 | 1359 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
103 | 112 | | |
104 | 113 | | |
105 | 114 | | |
| |||
172 | 181 | | |
173 | 182 | | |
174 | 183 | | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
175 | 187 | | |
176 | 188 | | |
177 | 189 | | |
| |||
0 commit comments