Perplexity vs Size Graphs for the recent quants (Deepseek-R1, Kimi-K2, Chimera etc.) #715
Replies: 3 comments 5 replies
-
@magikRUKKOLA Thank you for these graphs, very useful! Can one do something to improve discoverability? I personally find it a bit hard to find which point corresponds to which quantization. |
Beta Was this translation helpful? Give feedback.
-
Thanks @magikRUKKOLA for putting these together. Always interesting to see which quantization types are performing well on some of these big models. I just added a few more data points to my DeepSeek-V3.1 collection. The IQ4_KSS is doing unreasonably well again right around 4.0BPW. I went back and re-read this earlier discussion on QAT and IQ4_KS here: #359 (comment) and speculating wildly if it could have anything to do with ~4.0BPW being a "sweet spot" in the size vs perplexity trade-off curve. ![]() 👈 json data[
{
"name": "BF16",
"ppl": "3.3469 +/- 0.01936",
"size": 1250.084,
"bpw": 16.003,
"legend": "pure"
},
{
"name": "Q8_0",
"ppl": "3.3473 +/- 0.01935",
"size": 664.295,
"bpw": 8.504,
"legend": "pure",
"skip": true
},
{
"name": "IQ5_K",
"ppl": "3.3550 +/- 0.01942",
"size": 465.075,
"bpw": 5.944,
"legend": "ubergarm"
},
{
"name": "IQ4_K",
"ppl": "3.3715 +/- 0.01956",
"size": 384.765,
"bpw": 4.925,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ4_KS",
"ppl": "3.3806 +/- 0.01966",
"size": 363.151,
"bpw": 4.649,
"legend": "ubergarm",
"comment": ""
},
{
"name": "Q4_0",
"ppl": "3.4277 +/- 0.02000",
"size": 352.096,
"bpw": 4.507,
"legend": "pure",
"comment": "q4_K embd, q6_K head"
},
{
"name": "IQ4_KSS",
"ppl": "3.3887 +/- 0.01968",
"size": 325.088,
"bpw": 4.162,
"legend": "ubergarm",
"comment": ""
},
{
"name": "smol-IQ4_KSS",
"ppl": "3.3898 +/- 0.01964",
"size": 318.745,
"bpw": 4.080,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ3_K",
"ppl": "3.4260 +/- 0.01995",
"size": 293.177,
"bpw": 3.753,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ3_KS",
"ppl": "3.4534 +/- 0.02019",
"size": 277.397,
"bpw": 3.551,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ2_KL",
"ppl": "3.6312 +/- 0.02161",
"size": 231.206,
"bpw": 2.960,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ2_KT",
"ppl": "3.8109 +/- 0.02294",
"size": 204.592,
"bpw": 2.619,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks + PR to fix KT quantization"
},
{
"name": "IQ2_KS",
"ppl": "3.9583 +/- 0.02433",
"size": 193.144,
"bpw": 2.472,
"legend": "ubergarm",
"comment": "PR624 ik/quantization_tweaks"
},
{
"name": "IQ1_KT",
"ppl": "4.3987 +/- 0.02786",
"size": 154.968,
"bpw": 1.984,
"legend": "ubergarm",
"comment": ""
},
{
"name": "IQ1_S",
"ppl": "5.3113 +/- 0.03507",
"size": 133.610,
"bpw": 1.710,
"legend": "ubergarm",
"comment": ""
}
] |
Beta Was this translation helpful? Give feedback.
-
Add my test result : Kimi-K2-Instruct-UD-Q3_K_XL : PPL = 3.2330 +/- 0.01668 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
GRAPHS:
DATA SOURCES:
CODE: #477 (comment)
Beta Was this translation helpful? Give feedback.
All reactions