You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
iq1_kt: CUDA dequantize
Testing with LlaMA-3.1-8B-Instruct, we get almost the same PPL
as iq2_xxs, so about 0.2 bpw fewer bits for the same quality.
iq1_kt: CUDA MMQ
iq1_kt: CUDA MMVQ
iq1_kt: AVX2 GEMM/GEMV
iq1_kt: convert/repack to q8_0_r8 (AVX2)
iq1_kt: slightly faster GEMV
18.6 t/s -> 19.4 t/s
iq1_kt: NEON GEMM/GEMV
Pathetic as usual
iq1_kt: slightly faster NEON - still pathetic
iq1_kt: tiny bit better GEMV on NEON
iq1_kt: convert/repack to q8_0_r8 (NEON)
iq1_kt: very slightly faster convert/repack to q8_0_r8 on NEON
Adding frgotten file
Update stable-diffusion.h
Update IKL files, including IQ1_KT
Update constants.py
Co-Authored-By: Kawrakow <[email protected]>
0 commit comments