Skip to content

Commit 36874c0

Browse files
committed
1.77003_b3934 - IK new KV quants edition.
Milestone version: - Iwan Kawrakow's amazing work on quantization gives us 2 more K and V quants : iq4_nl, and a whole new q6_0. - IQ4_NL brings a 1%+ PPL decrease compared to Q4_0, for the same bitrate. - Q6_0 comes close to Q8_0, at 2BPW less. Both are offered "as such", and in mix with other quants (K being much more sensitive to quantization than V), in order to have the best quality of inference. I recommend q6_0/q5_0 for quality, and q5_iq4_nl for savings. q6_iq4_nl being a compromise. More legacy quants might vanish in the future if IK completes his quants with the support of head sizes other than 128. Aside that, several other commits of IK made there way into Croco.cpp. Mostly focused on Cuda's performances, of course! All credits for the benefits go to Ikawrakow, I'm just the laborious mailman!
1 parent f4b5138 commit 36874c0

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

koboldcpp.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,10 @@
4444
modelbusy = threading.Lock()
4545
requestsinqueue = 0
4646
defaultport = 5001
47-
KcppVersion = "1.77002"
47+
KcppVersion = "1.77003"
4848
LcppVersion = "b3934"
4949
CudaSpecifics = "CuCML_ArCML_SMC2_DmmvX32Y1"
50-
ReleaseDate = "2024/10/19"
50+
ReleaseDate = "2024/10/21"
5151
showdebug = True
5252
guimode = False
5353
showsamplerwarning = True

0 commit comments

Comments
 (0)