Perplexity vs Size Graphs for the recent quants (Deepseek-V3.1-Terminus, Deepseek-R1, Qwen3-Coder, Kimi-K2, Chimera etc.) #715
Replies: 6 comments 34 replies
-
|
@magikRUKKOLA Thank you for these graphs, very useful! Can one do something to improve discoverability? I personally find it a bit hard to find which point corresponds to which quantization. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @magikRUKKOLA for putting these together. Always interesting to see which quantization types are performing well on some of these big models. I just added a few more data points to my DeepSeek-V3.1 collection. The IQ4_KSS is doing unreasonably well again right around 4.0BPW. I went back and re-read this earlier discussion on QAT and IQ4_KS here: #359 (comment) and speculating wildly if it could have anything to do with ~4.0BPW being a "sweet spot" in the size vs perplexity trade-off curve.
👈 json data[
{
"name": "BF16",
"ppl": "3.3469 +/- 0.01936",
"size": 1250.084,
"bpw": 16.003,
"legend": "pure"
},
{
"name": "Q8_0",
"ppl": "3.3473 +/- 0.01935",
"size": 664.295,
"bpw": 8.504,
"legend": "pure",
"skip": true
},
{
"name": "IQ5_K",
"ppl": "3.3550 +/- 0.01942",
"size": 465.075,
"bpw": 5.944,
"legend": "ubergarm"
},
{
"name": "IQ4_K",
"ppl": "3.3715 +/- 0.01956",
"size": 384.765,
"bpw": 4.925,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ4_KS",
"ppl": "3.3806 +/- 0.01966",
"size": 363.151,
"bpw": 4.649,
"legend": "ubergarm",
"comment": ""
},
{
"name": "Q4_0",
"ppl": "3.4277 +/- 0.02000",
"size": 352.096,
"bpw": 4.507,
"legend": "pure",
"comment": "q4_K embd, q6_K head"
},
{
"name": "IQ4_KSS",
"ppl": "3.3887 +/- 0.01968",
"size": 325.088,
"bpw": 4.162,
"legend": "ubergarm",
"comment": ""
},
{
"name": "smol-IQ4_KSS",
"ppl": "3.3898 +/- 0.01964",
"size": 318.745,
"bpw": 4.080,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ3_K",
"ppl": "3.4260 +/- 0.01995",
"size": 293.177,
"bpw": 3.753,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ3_KS",
"ppl": "3.4534 +/- 0.02019",
"size": 277.397,
"bpw": 3.551,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ2_KL",
"ppl": "3.6312 +/- 0.02161",
"size": 231.206,
"bpw": 2.960,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ2_KT",
"ppl": "3.8109 +/- 0.02294",
"size": 204.592,
"bpw": 2.619,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks + PR to fix KT quantization"
},
{
"name": "IQ2_KS",
"ppl": "3.9583 +/- 0.02433",
"size": 193.144,
"bpw": 2.472,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ1_KT",
"ppl": "4.3987 +/- 0.02786",
"size": 154.968,
"bpw": 1.984,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ1_S",
"ppl": "5.3113 +/- 0.03507",
"size": 133.610,
"bpw": 1.710,
"legend": "ubergarm",
"comment": ""
}
] |
Beta Was this translation helpful? Give feedback.
-
|
Add my test result : Kimi-K2-Instruct-UD-Q3_K_XL : PPL = 3.2330 +/- 0.01668 |
Beta Was this translation helpful? Give feedback.
-
|
I tried to calculate the Kimi-K2-Instruct-0905-THIREUS-IQ3_K-SPECIAL_SPLIT and got very bad results. PPL 2.7851 with 3.4325bpw? Seems like something is very wrong. I made sure all the split files are valild. Anyways, honestly, its unlikely I will be using Kimi-K2 personally. The DeepSeek-V3.1-Terminus just got released. :) |
Beta Was this translation helpful? Give feedback.
-
|
I was thinking about an automated tool that finds the most optimal quant given the input parameters such as the RAM/VRAM limitations, the prefill/decode speed, the max context length, and the perplexity. So that everyone would find the exact quant they want spending very little work for that. |
Beta Was this translation helpful? Give feedback.
-
|
Qwen3-Coder added. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
GRAPHS:
DATA SOURCES:
{ "title": "DeepSeek-V3.1-Terminus (671B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 671000000000, "data": [ {"name": "IQ1_S", "bpw": 1.745, "ppl": 5.4829, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ1_S"}, {"name": "IQ1_KT", "bpw": 1.987, "ppl": 4.5310, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ1_KT"}, {"name": "IQ2_KS", "bpw": 2.472, "ppl": 4.0280, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ2_KS"}, {"name": "IQ2_KL", "bpw": 2.962, "ppl": 3.7112, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ2_KL"}, {"name": "IQ3_KS", "bpw": 3.545, "ppl": 3.5174, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ3_KS"}, {"name": "IQ3_K", "bpw": 3.724, "ppl": 3.4781, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ3_K"}, {"name": "smol-IQ4_KSS", "bpw": 4.080, "ppl": 3.4445, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/smol-IQ4_KSS"}, {"name": "IQ4_K", "bpw": 4.896, "ppl": 3.4198, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ4_K"}, {"name": "smol-IQ5_KS", "bpw": 5.339, "ppl": 3.4059, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/smol-IQ5_KS"}, {"name": "THIREUS-5.4498bpw-R4", "bpw": 5.4498, "ppl": 3.3961, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14579570"}, {"name": "IQ5_K", "bpw": 5.941, "ppl": 3.4000, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/IQ5_K"}, {"name": "THIREUS-6.2212bpw", "bpw": 6.2212, "ppl": 3.3949, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14554951"}, {"name": "Q8_0", "bpw": 8.504, "ppl": 3.3929, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-Terminus-GGUF/tree/main/Q8_0"} ] } { "title": "DeepSeek-R1-0528 (671B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 671000000000, "data": [ {"name": "IQ1_S_R4", "bpw": 1.664, "ppl": 4.8831, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ1_S_R4"}, {"name": "THIREUS-1.9364", "bpw": 1.9364, "ppl": 4.3533, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-1.9364bpw-4.3533ppl.151GB-GGUF_11GB-GPU_140GB-CPU.3c88ec6_9fd615d.recipe"}, {"name": "IQ2_KT", "bpw": 2.514, "ppl": 3.6378}, {"name": "THIREUS-2.7840", "bpw": 2.7840, "ppl": 3.4341, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-2.7840bpw-3.4341ppl.217GB-GGUF_14GB-GPU_203GB-CPU.3c88ec6_02247be.recipe"}, {"name": "IQ2_K_R4", "bpw": 2.799, "ppl": 3.5069, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ2_K_R4"}, {"name": "JWNoctis/R1-0528/IQ2_KL", "bpw": 2.930, "ppl": 3.4379, "url": "https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826/354"}, {"name": "UD_Q2_K_XL", "bpw": 2.994, "ppl": 3.5278, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/UD-Q2_K_XL"}, {"name": "THIREUS-3.1027", "bpw": 3.1027, "ppl": 3.3372, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1027bpw-3.3372ppl.242GB-GGUF_11GB-GPU_231GB-CPU.3c88ec6_adc8101.recipe"}, {"name": "THIREUS-3.1446", "bpw": 3.1446, "ppl": 3.3257, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1446bpw-3.3257ppl.246GB-GGUF_15GB-GPU_231GB-CPU.3c88ec6_7d1efe1.recipe"}, {"name": "THIREUS-3.1447", "bpw": 3.1447, "ppl": 3.3269, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1447bpw-3.3269ppl.246GB-GGUF_15GB-GPU_231GB-CPU.3c88ec6_4b1254a.recipe"}, {"name": "THIREUS-3.1525", "bpw": 3.1525, "ppl": 3.3251, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1525bpw-3.3251ppl.246GB-GGUF_15GB-GPU_231GB-CPU.3c88ec6_5a3fc0f.recipe"}, {"name": "THIREUS-3.1740", "bpw": 3.1740, "ppl": 3.3253, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1740bpw-3.3253ppl.248GB-GGUF_17GB-GPU_231GB-CPU.3c88ec6_6cf3a72.recipe"}, {"name": "THIREUS-3.1858", "bpw": 3.1858, "ppl": 3.3261, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.1858bpw-3.3261ppl.249GB-GGUF_18GB-GPU_231GB-CPU.3c88ec6_027b7ff.recipe"}, {"name": "THIREUS-3.2564", "bpw": 3.2564, "ppl": 3.2985, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.2564bpw-3.2985ppl.254GB-GGUF_15GB-GPU_239GB-CPU.3c88ec6_7c0be1e.recipe"}, {"name": "IQ3_KT", "bpw": 3.483, "ppl": 3.3056, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ3_KT"}, {"name": "THIREUS-3.5652", "bpw": 3.5652, "ppl": 3.2734, "url": "https://github.com/Thireus/GGUF-Tool-Suite/blob/main/recipe_examples/DeepSeek-R1-0528.THIREUS-3.5652bpw-3.2734ppl.278GB-GGUF_14GB-GPU_264GB-CPU.3c88ec6_9b5660b.recipe"}, {"name": "IQ3_KS", "bpw": 3.598, "ppl": 3.2991, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ3_KS"}, {"name": "THIREUS-3.6766", "bpw": 3.6766, "ppl": 3.2741, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13781700"}, {"name": "IQ3_K_R4", "bpw": 3.847, "ppl": 3.2730, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ3_K_R4"}, {"name": "THIREUS-3.976", "bpw": 3.976, "ppl": 3.2452, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13798329"}, {"name": "IQ4_XS (unsloth)", "bpw": 4.2683, "ppl": 3.2598, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/IQ4_XS"}, {"name": "q4_0", "bpw": 4.508, "ppl": 3.2895, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/Q4_0"}, {"name": "UD_Q4_K_XL", "bpw": 4.578, "ppl": 3.2483, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/UD-Q4_K_XL"}, {"name": "IQ4_KS_R4", "bpw": 4.701, "ppl": 3.2286, "url": "https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF/tree/main/IQ4_KS_R4"}, {"name":"THIREUS-5.0601","bpw":5.0601,"ppl":3.2223,"url":"https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14625973"}, {"name": "DQ4_K_R4", "bpw": 5.289, "ppl": 3.2276, "url": "https://huggingface.co/anikifoss/DeepSeek-R1-0528-DQ4_K_R4"}, {"name": "THIREUS-6.2478", "bpw": 6.2478, "ppl": 3.2240, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13781560"}, {"name": "THIREUS-6.4296", "bpw": 6.4296, "ppl": 3.2231, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/718#discussioncomment-14193821"}, {"name": "THIREUS-6.5522", "bpw": 6.5522, "ppl": 3.2227, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/718#discussioncomment-14193821"}, {"name": "Q8_0", "bpw": 8.5259260, "ppl": 3.2130, "url": "https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/Q8_0"} ] } { "title": "DeepSeek-V3.1 (671B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 671000000000, "data": [ { "name": "IQ1_S", "bpw": 1.710, "ppl": 5.3113, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ1_S" }, { "name": "IQ1_KT", "bpw": 1.984, "ppl": 4.3987, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ1_KT" }, { "name": "IQ2_KS", "bpw": 2.472, "ppl": 3.9583, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ2_KS" }, { "name": "IQ2_KT", "bpw": 2.619, "ppl": 3.8109, "url": "" }, { "name": "IQ2_KL", "bpw": 2.960, "ppl": 3.6312, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ2_KL" }, { "name": "IQ3_KS", "bpw": 3.551, "ppl": 3.4534, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ3_KS" }, { "name": "IQ3_K", "bpw": 3.753, "ppl": 3.4260, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ3_K" }, { "name": "smol-IQ4_KSS", "bpw": 4.080, "ppl": 3.3898, "url": "" }, { "name": "IQ4_KSS", "bpw": 4.162, "ppl": 3.3887, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ4_KSS" }, { "name": "Q4_0", "bpw": 4.507, "ppl": 3.4277, "url": "" }, { "name": "UD-Q4_K_XL", "bpw": 4.507, "ppl": 3.4013, "url": "https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF/tree/main/UD-Q4_K_XL" }, { "name": "IQ4_KS", "bpw": 4.649, "ppl": 3.3806, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ4_KS" }, { "name": "IQ4_K", "bpw": 4.925, "ppl": 3.3715, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ4_K" }, { "name": "IQ5_K", "bpw": 5.944, "ppl": 3.3550, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/IQ5_K" }, { "name": "Q8_0", "bpw": 8.504, "ppl": 3.3473, "url": "https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/tree/main/Q8_0" }, { "name": "BF16", "bpw": 16.003, "ppl": 3.3469, "url": "" } ] } { "title": "DeepSeek-TNG-R1T2-Chimera (671B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 671000000000, "data": [ {"name": "IQ1_S", "bpw": 1.699, "ppl": 4.9878, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ1_S"}, {"name": "THIREUS-1.6693", "bpw": 1.6693, "ppl": 4.9676, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13883488"}, {"name": "THIREUS-1.7067", "bpw": 1.7067, "ppl": 4.9199, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13914222"}, {"name": "THIREUS-2.0622", "bpw": 2.0622, "ppl": 4.0622, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13914222"}, {"name": "THIREUS-2.1572", "bpw": 2.1572, "ppl": 4.0228, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13883488"}, {"name": "IQ2_XSS", "bpw": 2.168, "ppl": 4.0078, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ2_XSS"}, {"name": "IQ2_KT", "bpw": 2.188, "ppl": 3.8887, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ2_KT"}, {"name": "THIREUS-2.5961", "bpw": 2.5961, "ppl": 3.6768, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13883488"}, {"name": "IQ2_KS", "bpw": 2.602, "ppl": 3.6254, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ2_KS"}, {"name": "THIREUS-2.6261", "bpw": 2.6261, "ppl": 3.5627, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13914222"}, {"name": "THIREUS-3.5753", "bpw": 3.5753, "ppl": 3.3187, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13883488"}, {"name": "THIREUS-3.5858", "bpw": 3.5858, "ppl": 3.3063, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/477#discussioncomment-13914222"}, {"name": "IQ3_KS", "bpw": 3.598, "ppl": 3.3167, "url": "https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main/IQ3_KS"} ] } { "title": "Kimi-K2-Instruct-0905 (1026B) Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 1026000000000, "data": [ {"name": "smol-IQ1_KT", "bpw": 1.832, "ppl": 4.2224, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ1_KT"}, {"name": "smol-IQ2_KS", "bpw": 2.261, "ppl": 3.4977, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ2_KS"}, {"name": "IQ2_KS", "bpw": 2.425, "ppl": 3.2478, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/IQ2_KS"}, {"name": "smol-IQ2_KL", "bpw": 2.755, "ppl": 2.9294, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ2_KL"}, {"name": "IQ2_KL", "bpw": 3.000, "ppl": 2.7993, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/IQ2_KL"}, {"name": "smol-IQ3_KS", "bpw": 3.249, "ppl": 2.5902, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ3_KS"}, {"name": "IQ3_KS", "bpw": 3.520, "ppl": 2.5640, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/IQ3_KS"}, {"name": "UD-Q3_K_XL", "bpw": 3.521, "ppl": 2.6706, "url": "https://huggingface.co/unsloth/Kimi-K2-Instruct-0905-GGUF/tree/main/UD-Q3_K_XL"}, {"name": "THIREUS-4.0285", "bpw": 4.034, "ppl": 2.493, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14485602"}, {"name": "smol-IQ4_KSS", "bpw": 4.059, "ppl": 2.5185, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ4_KSS"}, {"name": "IQ4_KS", "bpw": 4.633, "ppl": 2.4641, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/IQ4_KS"}, {"name": "smol-IQ5_KS", "bpw": 5.295, "ppl": 2.4526, "url": "https://huggingface.co/ubergarm/Kimi-K2-Instruct-0905-GGUF/tree/main/smol-IQ5_KS"} ] } { "title": "GLM-4.6 Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 357000000000, "data": [ {"name": "smol-IQ4_KSS", "bpw": 4.090, "ppl": 3.5911, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/smol-IQ4_KSS"}, {"name": "smol-IQ1_KT", "bpw": 1.948, "ppl": 5.9034, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/smol-IQ1_KT"}, {"name": "smol-IQ2_KS", "bpw": 2.359, "ppl": 5.2760, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/smol-IQ2_KS"}, {"name": "IQ2_KL", "bpw": 3.070, "ppl": 4.1456, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ2_KL"}, {"name": "IQ3_KS", "bpw": 3.573, "ppl": 3.6427, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ3_KS"}, {"name": "IQ4_KS", "bpw": 4.646, "ppl": 3.5309, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ4_KS"}, {"name": "IQ4_K", "bpw": 5.001, "ppl": 3.4758, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ4_K"}, {"name": "THIREUS-5.5774bpw", "bpw": 5.5774, "ppl": 3.4486, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14572398"}, {"name": "UD-Q5_K_XL(unsloth)", "bpw": 5.6471, "ppl": 3.4807, "url": "https://huggingface.co/unsloth/GLM-4.6-GGUF/tree/main/UD-Q5_K_XL"}, {"name": "IQ5_K", "bpw": 5.997, "ppl": 3.4428, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ5_K"}, {"name": "Q8_0", "bpw": 8.505, "ppl": 3.4471, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/Q8_0"}, {"name": "BF16", "bpw": 16.003, "ppl": 3.4454, "url": "https://huggingface.co/ubergarm/GLM-4.6-GGUF"} ] } { "title": "Qwen3-Coder-480B-A35B-Instruct Quantization Analysis", "subtitle": "Lower perplexity = Better performance", "model_parameters": 480000000000, "data": [ {"name": "IQ1_KT", "bpw": 1.945, "ppl": 6.3370, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ1_KT"}, {"name": "IQ2_KS", "bpw": 2.578, "ppl": 5.6658, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ2_KS"}, {"name": "IQ2_K", "bpw": 2.588, "ppl": 5.6578, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ2_K"}, {"name": "IQ2_KL", "bpw": 3.034, "ppl": 5.4113, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ2_KL"}, {"name": "IQ3_K", "bpw": 3.865, "ppl": 5.1808, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ3_K"}, {"name": "IQ4_KSS", "bpw": 4.180, "ppl": 5.1579, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ4_KSS"}, {"name": "IQ4_K", "bpw": 4.885, "ppl": 5.1257, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ4_K"}, {"name": "THIREUS-5.1546bpw", "bpw": 5.1546, "ppl": 5.1057, "url": "https://github.com/ikawrakow/ik_llama.cpp/discussions/715#discussioncomment-14670424"}, {"name": "IQ5_K", "bpw": 5.900, "ppl": 5.1073, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ5_K"}, {"name": "Q8_0", "bpw": 8.503, "ppl": 5.0975, "url": "https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/Q8_0"} ] }CODE: #477 (comment)
Beta Was this translation helpful? Give feedback.
All reactions