Skip to content

Commit 646e311

Browse files
committed
Update README.md
1 parent 34edc18 commit 646e311

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ Croco.Cpp (CCPP) is a fork of the experimental branch of KoboldCPP (KCPP), mainl
2323
- 22 or so different modes of quantization for the context cache (F16, around 15 KV modes with Flash Attention, 7 quantum legacy K cache modes without Flash Attention for models like Gemma).
2424
- KV cache supports IQ4_NL and Q6_0 (except for Gemma), thanks to Ikawrakow.
2525
- Supports inference for B16 models in Cuda (thanks Ikawrakow).
26-
- Supports inference for new quants made by Ikawrakow (Q6_0 legacy for irregularly shaped tensors ; IQ_2K, 3K, 4K, 5K, 6K (first gen) ; IQ2_KS, 4_KSS, 4_KS (second gen, working with IK's reworked MMVQ template) ; IQ2_KT, 3_KT, 4_KT (Trellis, working with a restored DMMV kernel).
26+
- Supports inference for new quants made by Ikawrakow (Q6_0 legacy for irregularly shaped tensors ; IQ_2K, 3K, 4K, 5K, 6K (first gen)
27+
- Supported (up to v b4435) IQ2_KS, 4_KSS, 4_KS (second gen, working with IK's reworked MMVQ template) ; IQ2_KT, 3_KT, 4_KT (Trellis, working with a restored DMMV kernel). Not available in newer versions due to incompatibility with GGUF v14 format.
2728
- A dozen or so commits taken from Ikawrakow's IK_Llama.CPP for performances (notably on Gemma). That includes a few more GGML ops.
2829
- A slightly different benchmark (one flag per column instead of a single flag space).
2930
- 10 Stories slots instead of 6 in the web-interface (KLite).

0 commit comments

Comments
 (0)