Qwen3 quantization experiments #359

ikawrakow · 2025-04-30T08:22:05Z

ikawrakow
Apr 30, 2025
Maintainer

I did some experimentation with Qwen3 quantization. As I don't have the horsepower to run the flagship model, I experimented with the Qwen3-30B-A3B MoE model. I'm reporting the results here, hopefully this could be useful also for Qwen3-235B-A22B.

The following graph shows a comparison between the Unsloth so called "dynamic" quants and the quantization mixes I prepared. The Unsloth quantized models, shown with black symbols, are from their HF repository, and the text in black besides the data points gives the corresponding file name. The red symbols are for my quantization mixes and their recipes will be given below. The x-axis is model size in GiB (and not GB, as HF likes to use). The y-axis is the quantization error in percent defined as PPL(Q)/PPL(bf16)-1. Based on these results, it does not look like Unsloth did a particularly good job with their "dynamic" quants for this model. One can get the same quantization quality with ~2 GiB smaller model, so nearly 20% smaller at the low-bpw end.

My recipes are almost entirely composed of IQK quants, so exclusive to this repository. I did not go beyond 4.3 bpw as there the quantization error is 0.57% (and I have seen sub-1% quantization error to be called "lossless" in the quantization literature).

Recipe IQK-1

./bin/llama-quantize --imatrix $imatrix \
            --custom-q "attn=iq5_k,token_embd.weight=q4_K,output.weight=q6_K"
            --custom-q "blk\.[0-5]\.ffn_down_exps=iq4_ks,ffn_down_exps=iq2_ks"
            --custom-q "exps=iq2_ks" .\
            Qwen3-128x1.8B-BF16.gguf $model_file_name iq2_ks

Note that one can combine all arguments following --custom-q into a single, comma separated list of regular expressions. I have split into several --custom-q arguments for better readability. So, basically, all attention tensors quantized with IQ5_K, the first 6 layers of ffn_down_exps with IQ4_KS, everything else with IQ2_KS. Oh, here and for all other recipes, token embeddings are Q4_K and the output tensor is Q6_K. This quantized model ends up being 8.745 GiB, so only very slightly larger than Unsloth's UD-IQ1_S (8.396 GiB).

Recipe IQK-2

./bin/llama-quantize --imatrix $imatrix \
        --custom-q "attn=iq5_k,token_embd.weight=q4_K,output.weight=q6_K" \
        --custom-q "blk\.[0-5]\.ffn_down_exps=iq4_ks,ffn_down_exps=iq2_k,exps=iq2_k" \
        Qwen3-128x1.8B-BF16.gguf $model_file_name iq2_k

Very similar to Recipe-1, with all attention tensors quantized with IQ5_K, the first 6 layers of ffn_down_exps with IQ4_KS, all other experts with IQ2_K. The quantized model ends up being 9.314 GiB.

Recipe IQK-3

./bin/llama-quantize --imatrix $imatrix \
        --custom-q "attn=iq5_k,token_embd.weight=q4_K,output.weight=q6_K" \
        --custom-q "blk\.[0-5]\.ffn_down_exps=iq4_k,ffn_down_exps=iq3_k,exps=iq2_k"  \
         Qwen3-128x1.8B-BF16.gguf $model_file_name iq2_k

The difference to Recipe IQK-2 is that the first 6 layers of ffn_down_exps is quantized with IQ4_K, the remaining ffn_down_exps tensors with IQ3_K. The quantized model size is 10.389 GiB.

Recipe IQK-4

./bin/llama-quantize --imatrix $imatrix \
        --custom-q "attn=iq5_k,token_embd.weight=q4_K,output.weight=q6_K" \
        --custom-q "blk\.[0-5]\.ffn_down_exps=iq4_k,ffn_down_exps=iq3_k" \
        --custom-q "blk\.[0-9]\.ffn=iq3_k,blk\.1[0-5]\.ffn=iq3_k,blk\.4[0-9]\.ffn=iq3_k" \
        --custom-q "exps=iq2_k" Qwen3-128x1.8B-BF16.gguf $model_file_name iq3_k

Similar to Recipe IQK-3, but now the first 16 and the last 8 layers of the ffn_up_exps and ffn_gate_exps tensors are quantized with IQ3_K. The quantized model size is 11.584 GiB.

Recipe IQK-5

./bin/llama-quantize --imatrix $imatrix \
        --custom-q "attn=iq5_k,token_embd.weight=q4_K,output.weight=q6_K" \
        --custom-q "blk\.[0-5]\.ffn_down_exps=iq4_k,ffn_down_exps=iq3_k,exps=iq3_k" \
         Qwen3-128x1.8B-BF16.gguf $model_file_name iq3_k

I.e., all experts are IQ3_K, except for the first 6 layers of ffn_down_exps, which are IQ4_K. Model size is 12.779 GiB

Recipe IQK-6

./bin/llama-quantize --imatrix $imatrix \
        --custom-q "attn=iq5_k,token_embd.weight=q4_K,output.weight=q6_K" \
        --custom-q ".*=iq4_ks" Qwen3-128x1.8B-BF16.gguf $model_file_name" iq4_ks

I.e., all tensors (except attention, output and embeddings) are IQ4_KS. The quantized model size is 15.454 GiB.

ikawrakow · 2025-04-30T09:11:48Z

ikawrakow
Apr 30, 2025
Maintainer Author

Has there been any QAT going on with the Qwen3 models? I didn't see anything mentioned in the linked blog post, but there are indications that QAT may have been involved. Does somebody know?

2 replies

saood06 Apr 30, 2025
Collaborator

but there are indications that QAT may have been involved.

What indications are you referring to?

ikawrakow Apr 30, 2025
Maintainer Author

I'm putting together the results and will post in a bit. In the meantime I was curious if somebody knew if QAT was used for Qwen3.

ikawrakow · 2025-04-30T12:25:59Z

ikawrakow
Apr 30, 2025
Maintainer Author

QAT used in Qwen3 training?

After posting the above results, I decided to see what I get with IQ4_K quantization. IQ4_KS, which is 4.25 bpw, had arrived at a quantization error of 0.6%. IQ4_K is 4.5 bpw and normally better than IQ4_KS, and I was thinking that it may get into 6-bit territory. It uses blocks of 16 vs blocks of 32 for IQ4_KS. Other than that, the two quants are extremely similar (same non-uniform grid, same quantization approach). To my surprise, IQ4_K arrived at a slightly higher perplexity than IQ4_KS. So, I thought "OK, there is something funny going on here. What if I replaced IQ4_K with IQ4_KS in the above quantization mixes, and for good measure also used IQ4_KS instead of IQ5_K for the attention tensors?". I started with the Recipe IQK-1, and PPL dropped from 10.09 to 9.95, while decreasing the model size from 8.745 GiB to 8.615 GiB (not a big reduction, but for very low bpw quants quantization error increases quite fast with decreasing model size, so reducing model size and PPL is quite of an effect).

OK, then, let's just redo all recipes, using IQ4_KS instead of IQ5_K or IQ4_K. The Wiki2 perplexity for the bf16 model is 9.0656, and here are the new results for the 6 recipes:

Recipe	Model size	PPL
IQK-1	8.615 GiB	9.9517
IQK-2	9.183 GiB	9.7154
IQK-3	10.229 GiB	9.3908
IQK-4	11.454 GiB	9.1057
IQK-5	12.620 GiB	9.0147
IQK-6	15.324 GiB	8.9873

Oops. Recipes 5 and 6 have a lower PPL than the bf16 model!

Hmm, my bf16 value must be wrong. Let's recompute that with mainline. And to not take any chances, let's not use Q8_0 as a surrogate for bf16. Which, given my 16 GB GPU, basically means computing on the CPU. Mainline is slow as molasses on the CPU, but still let's see. 50 minutes later: mainline PPL = 9.0665.

Oops.

OK, it must be my imatrix. People have filled whole libraries writing about how the imatrix calibration data needs to be random, diverse, whatnot. OK, let's grab the Unsloth imatrix. Quantize, run llama-perplexity for recipe IQK-6 . Result: PPL = 8.8787.

Oops. That's definitely even less diverse than mine.

Let's grab Bartowski imatrix. Quantize recipe IQK-6, run llama-perplexity. Result: PPL = 8.9727.

Oops. It looks like mine, obtained 100% from wiki.train.raw, is more diverse than theirs (as it over-fits wiki.test.raw less). I did use way more batches than they did, but still.

What happens if I don't use an imatrix at all? Quantize recipe IQK-6, run llama-perplexity. Result: PPL = 9.3119.

So, that's about 2.6% quantization error, so in the range of what I would expect from IQ4_KS without imatrix. Oh, wait. I happen to know that when no imatrix is provided, the quantization function uses $w_i = x_i^2$ as the importance of model weight $x_i$. This gives more importance to larger magnitude weights. Historically, from the pre-matrix days, this was the best strategy and always resulted in a better quantization than just assigning the same importance to all model weights. But if QAT was involved in the training, and if the model weights have been forced (guided) against some fp4 variant (more on this below), then doing that will be detrimental to quantization accuracy. So, let's just set w_i = 1, quantize again, run llama-perplexity. Result: PPL = 9.1470. That's just 0.9% higher than bf16 using 4.25 bpw. The table below summarizes the above observations:

What	PPL
bf16 model	9.0656
No imatrix, w = x^2	9.3119
No imatrix, w = 1	9.1470
IK imatrix	8.9873
Bartowski imatrix	8.9727
Unsloth imatrix	8.8787

So, what if they have used some form of QAT targeted towards some fp4 variant? fp4 does have just 16 distinct values, and, having a mantissa and an exponent, possible values are non-uniformly distributed between a min and a max value. This is kind of similar to the non-linear quantization types IQ4_NL, IQ4_XS, IQ4_KS, IQ4_K, so let's take look. The following graph compares nf4 and fe1m2 to the iq4k_values used for IQ4_K and IQ4_KS. The thick black line illustrates linear mapping. The four different iq4k variants are all achievable, depending on the sign of the block scale and the block shift bit (the sign of the block scale does not matter for the fp4 values as they are symmetric).

Looking at this graph, it seems plausible that if fp4 QAT was used with blocks of 32, IQ4_KS would adapt quite well to that.

Just in case, I also checked PPL for Q4_0. Without imatrix we get PPL = 9.3017, so no, unlike Google with Gemma3, the Qwen3 creators have not been overfitting to Q4_0.

8 replies

ikawrakow May 1, 2025
Maintainer Author

it's just they state, "While it is not clear if adding Gaussian noise to the log-likelihoods is an accurate representation of the behavior of compression schemes, it appears to be a good analogy. ", and I don't have any intuition of whether or not that statement is correct, but thought you might have a take on that if you don't mind sharing.

This is indeed one critique of their argument, and they deal with it in a very hand wavy way. Apart from the overall quality of the paper, if I just focus on their Table 19, which is the crux of their argument against PPL, here are several other points:

Why is it that the table does not contain the response of the model without Gaussian noise added? If I'm making the argument that Gaussian noise degrades quality without changing PPL, then I need to show this to be true. I do that by asking the exact same question for each noise level, or if I for some reason decided to go the strange route of giving the LLM a task with increasing difficulty, then I definitely need to also show the response without noise
Did they actually compute PPL? The odds that it remained exactly the same with 5 standard deviations of noise added are pretty much zero.
What did they do with KLD? Did they add Gaussian noise just to the top token? Or did they add noise to all tokens? They don't say. I happen to expect that if they added Gaussian noise to all predicted probabilities, they would observe a statistically insignificant change in KLD
Let's look at the amount of noise added:
- Is it added to the probabilities p or to ln(p)? They don't say
- What does 0.0, 1.0, 2.0, ..., 5.0 mean? Is this measured in standard deviations, or is it the actual width of the Gaussian? If the former, they should have specified the model so we can see the standard deviation in Fig 9. But I would guess it is the latter. Do you know what it means to add a Gaussian noise with $\sigma = 5$ to (I assume) the logits? It basically wipes out whatever the model has predicted. And then I wonder what they did in practice. The logits need to be in $[-\infty, 0)$ (but the important once are concentrated in, say, [-5, 0)). When I add Gaussian noise with $\sigma = 5$, chances are pretty high the logit may go out of the allowed range. What is it that I do then? I clamp it? If I do that, I no longer have the argument that PPL remains unchanged because it doesn't (the noise distribution used is no longer symmetric, so adding noise does modify the expectation value, which is the PPL).
- We do have figure 9 in the paper. There I see that the standard deviation of the difference between the base model and the quantized models changes from small to about 0.2 when I go from 8 bits to 4 bits. Why on earth would I be making experiments with a Gaussian noise of 1,2,3,4, and 5? I have a hunch here. I expect that if they added a Gaussian noise corresponding to the difference between 8-bit and 4-bit quantization, they wouldn't be able to measure the difference. Which would make the entire argument fall apart.

Here are graphs that shows KLD vs PPL and correct top token probability vs PPL for the models studied in the blog post. The correlation coefficient for the straight line fits are 99% and 98%, respectively. I'm a physicist, and as part of my physics education I studied statistics. Physics experiments require a lot of effort, so they thought us that it is important to understand that it does not make sense to measure quantities that are highly correlated. When correlation is as high as 98-99%, measuring one lets you predict the other. This is how it is with PPL and KLD, and with PPL and correct top token.

But if you still have doubts, open a discussion and let's discuss it there. This discussion is about Qwen3 quantization.

bartowski1182 May 2, 2025

so strange to see decreasing PPL when quantizing 🤔

I suppose one theory could be that by quantizing reduces some of the noise that's correlated with thinking or other stranger text, and so it's more likely to produce wiki text style generation? that wouldn't be absurd

KLD vs PPL offtopic yapping

it does always interest me when PPL and KLD are not *directly* correlated, like how PPL for one quant can decrease faster than the KLD, I completely accept your conclusion that they are correlated, it's quite obvious on many observations you've posted, but does make me curious when they diverge a little

your results do make me wonder about the possibility of QAT training.. feels like they would have spoken about that

also what do you think about QAT in terms of quantization target?

Would a QAT for int4 also help Q4_K with its scaling factors? and nf4 with its different format? or would it need to be specifically the same target quant format?

just thinking out loud

ubergarm May 2, 2025

I don't find any references to QAT for this Qwen3 release either, but the paper itself is not yet linked. I did find some official recommendations on quantizing by the Qwen team including GGUF format, some of the documentation maybe is recycled from previous Qwen2.5 release: https://github.com/QwenLM/Qwen3/tree/main/docs/source/quantization

ikawrakow May 2, 2025
Maintainer Author

@saood06

Are you sure about the other two imatrix overfitting? Do you have any data showing they perform worse when testing things other than wiki.test.raw?

It is hard to prove one model is working better than another with just subjective feelings about the quality of the responses. But if we assume that QAT was not involved in the training, and we observe that the quantized model arrives at a lower PPL for a given test corpus than the bf16 model, than this must be due to overfitting to the specific type of test data. The only way the overfitting can happen is via the imatrix. Hence, one imatrix resulting in a lower PPL than another imatrix can only mean that the first imatrix has been computed with calibration data that is more similar to the test corpus than the calibration data of the second imatrix.

You see it differently?

bartowski1182 May 2, 2025

Also I should note, specifically for the 30B (because I was having issues with experts not being activated) I generated ~100k more tokens of noise from the model which seemed to positively affect the results, there was a bunch of English and Chinese as well as a few other languages I noticed fly by, and a ton of emojis

But yeah with my usual dataset I couldn't make iq2_xs and smaller from lack of data, after augmenting it I had no issues

Point being, mine is very likely not overfit 😅

ubergarm · 2025-05-02T02:09:33Z

ubergarm
May 2, 2025

Oh man I just released ubergarm/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-mix-IQ4_K.gguf just before finding and reading this discussion!!! ooops!

I have some PPL data from wiki.test.raw as well as my own ubergarm-kld-test-corpus.txt and KLD with ubergarm-kld-test-corpus.txt using the bf16 as the baseline. I got Final estimate: PPL = 9.0703 +/- 0.07223 wiki.test.raw on the bf16 model using wiki-test.raw so basically same as yours.

EDIT: Add unreleased ubergarm/IQ4_KS (pure iq4_k all and q8_0 for non-repeating embedding/output).
EDIT: Add unreleased ubergarm/smol-IQ4_KS (same as above but q4_0 tok embedding and q6_0 out saving ~220KiB)

What	PPL `wiki.test.raw`	PPL `ubergarm-kld-test-corpus.txt`	Mean KLD	Mean Δp
bf16 model	9.0703 +/- 0.07223	15.1443 +/- 0.10239	n/a	n/a
Q8_0	9.0740 +/- 0.07228	15.152095 ± 0.102398	0.002337 ± 0.000009	-0.020 ± 0.003 %
ubergarm/mix-IQ4_K	9.1184 +/- 0.07278	15.218819 ± 0.103071	0.004821 ± 0.000024	-0.025 ± 0.004 %
ubergarm/IQ4_KS	8.9862 +/- 0.07061	15.182811 ± 0.102278	0.014617 ± 0.000068	-0.209 ± 0.007 %
ubergarm/smol-IQ4_KS	8.9864 +/- 0.07061	15.169532 ± 0.102138	0.014953 ± 0.000076	-0.239 ± 0.007 %
unsloth/UD-Q4_K_XL	9.1688 +/- 0.07290	15.281833 ± 0.103140	0.016495 ± 0.000071	-0.320 ± 0.008 %
bartowski/Q4_K_M	9.2092 +/- 0.07381	15.194468 ± 0.102605	0.010136 ± 0.000053	-0.158 ± 0.006 %
bartowski/Q4_K_S	9.2232 +/- 0.07371	15.202408 ± 0.102513	0.012915 ± 0.000065	-0.227 ± 0.007 %

I have some more KLD and token probability stats too with graphs to make a better write-up eventually.

So sounds like if Qwen was using QAT targeted at fp4, there it may be possible to use IQ4_KS to shave some weight without sacrificing quality? I'll have to try some more mixes...

If I'm following here, sounds like the goal to get as low as possible without going below the bf16 PPL? So using No imatrix, w = 1 would be better than over fitting with a poor imatrix corpus?

15 replies

ikawrakow May 3, 2025
Maintainer Author

For the next release of models, I could provide quants also compatible with ik_llama.cpp if that's interesting,

This would be great!

ikawrakow May 3, 2025
Maintainer Author

@saood06 Where did you get the RoPE factor change from?

saood06 May 3, 2025
Collaborator

Sorry for the delay but here is the same model (and it's repacked variant) running PPL on my CPU instead of my GPU with both F16 and Q8_0 cache.

./bin/llama-perplexity -m /mnt/sda/Qwen3/30BA3B/BF16/ggml-model-IQ4_KS.gguf -t 48 --numa distribute -fa -fmoe -f /mnt/sda/wikitext-2-raw/wiki.test.raw

[1]5.9421,[2]8.1927,[3]7.7555,[4]7.4395,[5]7.5550,[6]7.8861,[7]7.8997,[8]8.3538,[9]8.6928,[10]9.2011,[11]9.2374,[12]9.4389,[13]9.8798,[14]9.4990,[15]9.4075,[16]9.6493,[17]9.1190,[18]9.2654,[19]9.2127,[20]9.1786,[21]8.8810,[22]8.8592,[23]8.4972,[24]8.0599,[25]7.8473,[26]7.6375,[27]7.4392,[28]7.3235,[29]7.3546,[30]7.2892,[31]7.2458,[32]7.2848,[33]7.1788,[34]7.2525,[35]7.3712,[36]7.4891,[37]7.6488,[38]7.6965,[39]7.6930,[40]7.7501,[41]7.7511,[42]7.7227,[43]7.7793,[44]7.7816,[45]7.7782,[46]7.7920,[47]7.9712,[48]8.0624,[49]8.0477,[50]8.1206,[51]8.1634,[52]8.1939,[53]8.2635,[54]8.3355,[55]8.3368,[56]8.3736,[57]8.3533,[58]8.3895,[59]8.4563,[60]8.5081,[61]8.5273,[62]8.5757,[63]8.6332,[64]8.6948,[65]8.7780,[66]8.8502,[67]8.9238,[68]8.9077,[69]8.9213,[70]8.9152,[71]8.9446,[72]9.0097,[73]9.0484,[74]9.0658,[75]9.0213,[76]9.0207,[77]9.0674,[78]9.1220,[79]9.0543,[80]9.0327,[81]8.9955,[82]9.0330,[83]8.9980,[84]8.9797,[85]8.9957,[86]9.0879,[87]9.1206,[88]9.1114,[89]9.1156,[90]9.0878,[91]9.1366,[92]9.1114,[93]9.1541,[94]9.1613,[95]9.1370,[96]9.1217,[97]9.0913,[98]9.1025,[99]9.0818,[100]9.1397,[101]9.1669,[102]9.1511,[103]9.1666,[104]9.1319,[105]9.1313,[106]9.1162,[107]9.1455,[108]9.1829,[109]9.2087,[110]9.2591,[111]9.3767,[112]9.3708,[113]9.3269,[114]9.3914,[115]9.4002,[116]9.3635,[117]9.3469,[118]9.3245,[119]9.2799,[120]9.2920,[121]9.2728,[122]9.2644,[123]9.2229,[124]9.1683,[125]9.1374,[126]9.1121,[127]9.0553,[128]9.0235,[129]8.9870,[130]8.9534,[131]8.9024,[132]8.8621,[133]8.8458,[134]8.8409,[135]8.8381,[136]8.8353,[137]8.8056,[138]8.7716,[139]8.7776,[140]8.7609,[141]8.7523,[142]8.7583,[143]8.7660,[144]8.7938,[145]8.7642,[146]8.7244,[147]8.6863,[148]8.6483,[149]8.6290,[150]8.5933,[151]8.5642,[152]8.5501,[153]8.5518,[154]8.5073,[155]8.5085,[156]8.4679,[157]8.4410,[158]8.4047,[159]8.3706,[160]8.3344,[161]8.3151,[162]8.2999,[163]8.2803,[164]8.2795,[165]8.2596,[166]8.2519,[167]8.2440,[168]8.2684,[169]8.2696,[170]8.3013,[171]8.3252,[172]8.3771,[173]8.4215,[174]8.4405,[175]8.4971,[176]8.5253,[177]8.5726,[178]8.6124,[179]8.6233,[180]8.6229,[181]8.6453,[182]8.6689,[183]8.6529,[184]8.6659,[185]8.6748,[186]8.6781,[187]8.6886,[188]8.6896,[189]8.7028,[190]8.7323,[191]8.7405,[192]8.7552,[193]8.7495,[194]8.7734,[195]8.7933,[196]8.8040,[197]8.8102,[198]8.7884,[199]8.7820,[200]8.7713,[201]8.7798,[202]8.7925,[203]8.8131,[204]8.8292,[205]8.8452,[206]8.8340,[207]8.8595,[208]8.8408,[209]8.8426,[210]8.8404,[211]8.8406,[212]8.8410,[213]8.8402,[214]8.8219,[215]8.8045,[216]8.7990,[217]8.8092,[218]8.8056,[219]8.7769,[220]8.7465,[221]8.7359,[222]8.7253,[223]8.7204,[224]8.7356,[225]8.7143,[226]8.7154,[227]8.7056,[228]8.6803,[229]8.6504,[230]8.6280,[231]8.6113,[232]8.5968,[233]8.5955,[234]8.6037,[235]8.5983,[236]8.5845,[237]8.5714,[238]8.5516,[239]8.5431,[240]8.5465,[241]8.5474,[242]8.5579,[243]8.5570,[244]8.5749,[245]8.5769,[246]8.5958,[247]8.6000,[248]8.6038,[249]8.6106,[250]8.6185,[251]8.6362,[252]8.6513,[253]8.6782,[254]8.6970,[255]8.7010,[256]8.7152,[257]8.7257,[258]8.7110,[259]8.6900,[260]8.6689,[261]8.6443,[262]8.6275,[263]8.6215,[264]8.6231,[265]8.6331,[266]8.6370,[267]8.6359,[268]8.6263,[269]8.6286,[270]8.6263,[271]8.6169,[272]8.6157,[273]8.6124,[274]8.6088,[275]8.6094,[276]8.5980,[277]8.5958,[278]8.6014,[279]8.5984,[280]8.5937,[281]8.5862,[282]8.5940,[283]8.5661,[284]8.5335,[285]8.5387,[286]8.5233,[287]8.5050,[288]8.5009,[289]8.5044,[290]8.5271,[291]8.5305,[292]8.5298,[293]8.5299,[294]8.5467,[295]8.5656,[296]8.5790,[297]8.6023,[298]8.5983,[299]8.5836,[300]8.5851,[301]8.5821,[302]8.5790,[303]8.5716,[304]8.5895,[305]8.5893,[306]8.5858,[307]8.5882,[308]8.5861,[309]8.5847,[310]8.5902,[311]8.5947,[312]8.5843,[313]8.5767,[314]8.5806,[315]8.5679,[316]8.5722,[317]8.5937,[318]8.5988,[319]8.5935,[320]8.5962,[321]8.5822,[322]8.5938,[323]8.6081,[324]8.6246,[325]8.6451,[326]8.6473,[327]8.6393,[328]8.6400,[329]8.6248,[330]8.6160,[331]8.6067,[332]8.6094,[333]8.6093,[334]8.6015,[335]8.5906,[336]8.5850,[337]8.5913,[338]8.6002,[339]8.5931,[340]8.5843,[341]8.5726,[342]8.5705,[343]8.5651,[344]8.5739,[345]8.5767,[346]8.5722,[347]8.5604,[348]8.5609,[349]8.5540,[350]8.5479,[351]8.5481,[352]8.5527,[353]8.5525,[354]8.5395,[355]8.5559,[356]8.5639,[357]8.5694,[358]8.5577,[359]8.5585,[360]8.5562,[361]8.5612,[362]8.5565,[363]8.5525,[364]8.5621,[365]8.5795,[366]8.6048,[367]8.6218,[368]8.6509,[369]8.6707,[370]8.6858,[371]8.7065,[372]8.7295,[373]8.7375,[374]8.7475,[375]8.7676,[376]8.7812,[377]8.7919,[378]8.8057,[379]8.8162,[380]8.8345,[381]8.8494,[382]8.8632,[383]8.8752,[384]8.8877,[385]8.9149,[386]8.9328,[387]8.9352,[388]8.9365,[389]8.9445,[390]8.9669,[391]8.9880,[392]8.9852,[393]8.9824,[394]8.9757,[395]8.9762,[396]8.9852,[397]8.9897,[398]8.9920,[399]8.9966,[400]9.0130,[401]9.0166,[402]9.0160,[403]9.0058,[404]8.9966,[405]8.9859,[406]8.9811,[407]8.9847,[408]8.9888,[409]8.9858,[410]8.9847,[411]8.9964,[412]8.9986,[413]8.9954,[414]8.9885,[415]8.9781,[416]8.9648,[417]8.9667,[418]8.9676,[419]8.9658,[420]8.9613,[421]8.9631,[422]8.9492,[423]8.9490,[424]8.9455,[425]8.9432,[426]8.9452,[427]8.9520,[428]8.9640,[429]8.9695,[430]8.9636,[431]8.9566,[432]8.9619,[433]8.9607,[434]8.9588,[435]8.9688,[436]8.9559,[437]8.9575,[438]8.9572,[439]8.9490,[440]8.9583,[441]8.9546,[442]8.9467,[443]8.9401,[444]8.9412,[445]8.9305,[446]8.9344,[447]8.9319,[448]8.9223,[449]8.9132,[450]8.9145,[451]8.9106,[452]8.8972,[453]8.8887,[454]8.8840,[455]8.8843,[456]8.8816,[457]8.8868,[458]8.9028,[459]8.8996,[460]8.8998,[461]8.8956,[462]8.8948,[463]8.9059,[464]8.9055,[465]8.9082,[466]8.9101,[467]8.9155,[468]8.9215,[469]8.9251,[470]8.9314,[471]8.9222,[472]8.9301,[473]8.9192,[474]8.9196,[475]8.9270,[476]8.9310,[477]8.9254,[478]8.9135,[479]8.9158,[480]8.9246,[481]8.9317,[482]8.9211,[483]8.9296,[484]8.9367,[485]8.9395,[486]8.9368,[487]8.9417,[488]8.9333,[489]8.9202,[490]8.9191,[491]8.9106,[492]8.9070,[493]8.8952,[494]8.8916,[495]8.8835,[496]8.8818,[497]8.8905,[498]8.8965,[499]8.8913,[500]8.8920,[501]8.8942,[502]8.8910,[503]8.9055,[504]8.9131,[505]8.9160,[506]8.9148,[507]8.9086,[508]8.9131,[509]8.9069,[510]8.9083,[511]8.9134,[512]8.9092,[513]8.9116,[514]8.9145,[515]8.9163,[516]8.9187,[517]8.9250,[518]8.9234,[519]8.9224,[520]8.9215,[521]8.9238,[522]8.9147,[523]8.9188,[524]8.9188,[525]8.9215,[526]8.9270,[527]8.9271,[528]8.9276,[529]8.9231,[530]8.9185,[531]8.9215,[532]8.9175,[533]8.9182,[534]8.9183,[535]8.9215,[536]8.9150,[537]8.9236,[538]8.9343,[539]8.9334,[540]8.9489,[541]8.9492,[542]8.9396,[543]8.9415,[544]8.9484,[545]8.9451,[546]8.9434,[547]8.9368,[548]8.9231,[549]8.9212,[550]8.9072,[551]8.8968,[552]8.8867,[553]8.8572,[554]8.8546,[555]8.8574,[556]8.8589,[557]8.8588,[558]8.8577,[559]8.8627,[560]8.8673,[561]8.8740,[562]8.8856,[563]8.8937,[564]8.8904,[565]8.8993,[566]8.9036,[567]8.8936,[568]8.8860,[569]8.8794,[570]8.8776,[571]8.8760,[572]8.8860,[573]8.8888,[574]8.8917,[575]8.8922,[576]8.9001,[577]8.8969,[578]8.9013,[579]8.9089,[580]8.9221,[581]8.9237,[582]8.9349,[583]8.9200,[584]8.9170,
Final estimate: PPL = 8.9170 +/- 0.06824

./bin/llama-perplexity -m /mnt/sda/Qwen3/30BA3B/BF16/ggml-model-IQ4_KS.gguf -t 48 --numa distribute -fa -fmoe -f /mnt/sda/wikitext-2-raw/wiki.test.raw -ctk q8_0 -ctv q8_0

[1]5.9698,[2]8.2139,[3]7.7738,[4]7.4546,[5]7.5752,[6]7.9056,[7]7.9415,[8]8.3840,[9]8.7487,[10]9.2367,[11]9.2841,[12]9.4800,[13]9.9219,[14]9.5403,[15]9.4546,[16]9.6851,[17]9.1534,[18]9.3022,[19]9.2445,[20]9.2133,[21]8.9134,[22]8.8940,[23]8.5302,[24]8.0827,[25]7.8647,[26]7.6599,[27]7.4641,[28]7.3532,[29]7.3885,[30]7.3247,[31]7.2747,[32]7.3120,[33]7.2107,[34]7.2830,[35]7.3981,[36]7.5158,[37]7.6756,[38]7.7215,[39]7.7115,[40]7.7688,[41]7.7703,[42]7.7393,[43]7.7965,[44]7.7986,[45]7.7936,[46]7.8104,[47]7.9858,[48]8.0737,[49]8.0606,[50]8.1330,[51]8.1770,[52]8.2086,[53]8.2796,[54]8.3494,[55]8.3499,[56]8.3851,[57]8.3638,[58]8.3999,[59]8.4671,[60]8.5179,[61]8.5405,[62]8.5877,[63]8.6435,[64]8.7023,[65]8.7852,[66]8.8551,[67]8.9286,[68]8.9145,[69]8.9277,[70]8.9194,[71]8.9510,[72]9.0171,[73]9.0557,[74]9.0724,[75]9.0272,[76]9.0225,[77]9.0650,[78]9.1172,[79]9.0464,[80]9.0257,[81]8.9900,[82]9.0252,[83]8.9890,[84]8.9700,[85]8.9875,[86]9.0793,[87]9.1106,[88]9.1015,[89]9.1045,[90]9.0749,[91]9.1245,[92]9.0968,[93]9.1382,[94]9.1442,[95]9.1199,[96]9.1039,[97]9.0760,[98]9.0876,[99]9.0659,[100]9.1233,[101]9.1498,[102]9.1372,[103]9.1525,[104]9.1179,[105]9.1163,[106]9.1045,[107]9.1354,[108]9.1735,[109]9.1986,[110]9.2487,[111]9.3663,[112]9.3599,[113]9.3166,[114]9.3820,[115]9.3925,[116]9.3553,[117]9.3387,[118]9.3151,[119]9.2699,[120]9.2806,[121]9.2612,[122]9.2534,[123]9.2131,[124]9.1587,[125]9.1269,[126]9.1023,[127]9.0467,[128]9.0160,[129]8.9811,[130]8.9475,[131]8.8969,[132]8.8577,[133]8.8414,[134]8.8370,[135]8.8345,[136]8.8324,[137]8.8022,[138]8.7676,[139]8.7736,[140]8.7571,[141]8.7484,[142]8.7534,[143]8.7607,[144]8.7886,[145]8.7587,[146]8.7188,[147]8.6810,[148]8.6444,[149]8.6255,[150]8.5908,[151]8.5611,[152]8.5463,[153]8.5492,[154]8.5054,[155]8.5070,[156]8.4666,[157]8.4402,[158]8.4033,[159]8.3682,[160]8.3315,[161]8.3114,[162]8.2962,[163]8.2771,[164]8.2760,[165]8.2558,[166]8.2478,[167]8.2405,[168]8.2627,[169]8.2631,[170]8.2947,[171]8.3201,[172]8.3730,[173]8.4162,[174]8.4354,[175]8.4923,[176]8.5212,[177]8.5677,[178]8.6082,[179]8.6206,[180]8.6200,[181]8.6437,[182]8.6669,[183]8.6507,[184]8.6640,[185]8.6732,[186]8.6761,[187]8.6859,[188]8.6859,[189]8.6987,[190]8.7283,[191]8.7363,[192]8.7514,[193]8.7456,[194]8.7695,[195]8.7891,[196]8.8005,[197]8.8075,[198]8.7860,[199]8.7794,[200]8.7682,[201]8.7765,[202]8.7886,[203]8.8081,[204]8.8242,[205]8.8400,[206]8.8288,[207]8.8542,[208]8.8358,[209]8.8391,[210]8.8362,[211]8.8367,[212]8.8362,[213]8.8354,[214]8.8169,[215]8.8006,[216]8.7962,[217]8.8056,[218]8.8027,[219]8.7743,[220]8.7441,[221]8.7326,[222]8.7217,[223]8.7167,[224]8.7314,[225]8.7104,[226]8.7108,[227]8.7013,[228]8.6756,[229]8.6456,[230]8.6240,[231]8.6082,[232]8.5933,[233]8.5914,[234]8.6001,[235]8.5946,[236]8.5809,[237]8.5680,[238]8.5473,[239]8.5396,[240]8.5420,[241]8.5438,[242]8.5542,[243]8.5537,[244]8.5714,[245]8.5738,[246]8.5927,[247]8.5967,[248]8.6000,[249]8.6075,[250]8.6154,[251]8.6335,[252]8.6486,[253]8.6756,[254]8.6940,[255]8.6984,[256]8.7128,[257]8.7238,[258]8.7091,[259]8.6880,[260]8.6664,[261]8.6413,[262]8.6248,[263]8.6184,[264]8.6197,[265]8.6298,[266]8.6328,[267]8.6319,[268]8.6224,[269]8.6243,[270]8.6226,[271]8.6127,[272]8.6116,[273]8.6088,[274]8.6052,[275]8.6053,[276]8.5936,[277]8.5915,[278]8.5968,[279]8.5941,[280]8.5890,[281]8.5821,[282]8.5899,[283]8.5621,[284]8.5293,[285]8.5347,[286]8.5203,[287]8.5017,[288]8.4975,[289]8.5006,[290]8.5231,[291]8.5268,[292]8.5259,[293]8.5258,[294]8.5428,[295]8.5615,[296]8.5757,[297]8.5991,[298]8.5948,[299]8.5804,[300]8.5820,[301]8.5789,[302]8.5754,[303]8.5677,[304]8.5851,[305]8.5850,[306]8.5815,[307]8.5840,[308]8.5813,[309]8.5798,[310]8.5848,[311]8.5891,[312]8.5793,[313]8.5720,[314]8.5759,[315]8.5632,[316]8.5675,[317]8.5888,[318]8.5941,[319]8.5888,[320]8.5913,[321]8.5775,[322]8.5893,[323]8.6036,[324]8.6202,[325]8.6407,[326]8.6434,[327]8.6356,[328]8.6357,[329]8.6207,[330]8.6117,[331]8.6021,[332]8.6055,[333]8.6056,[334]8.5974,[335]8.5865,[336]8.5808,[337]8.5873,[338]8.5965,[339]8.5896,[340]8.5814,[341]8.5697,[342]8.5676,[343]8.5621,[344]8.5707,[345]8.5741,[346]8.5693,[347]8.5572,[348]8.5582,[349]8.5512,[350]8.5455,[351]8.5457,[352]8.5501,[353]8.5494,[354]8.5369,[355]8.5534,[356]8.5616,[357]8.5670,[358]8.5552,[359]8.5561,[360]8.5539,[361]8.5589,[362]8.5541,[363]8.5512,[364]8.5610,[365]8.5775,[366]8.6031,[367]8.6201,[368]8.6491,[369]8.6687,[370]8.6833,[371]8.7041,[372]8.7271,[373]8.7348,[374]8.7444,[375]8.7646,[376]8.7779,[377]8.7884,[378]8.8020,[379]8.8125,[380]8.8307,[381]8.8463,[382]8.8603,[383]8.8728,[384]8.8855,[385]8.9130,[386]8.9304,[387]8.9322,[388]8.9337,[389]8.9417,[390]8.9643,[391]8.9847,[392]8.9817,[393]8.9783,[394]8.9711,[395]8.9715,[396]8.9809,[397]8.9852,[398]8.9871,[399]8.9918,[400]9.0075,[401]9.0107,[402]9.0104,[403]9.0007,[404]8.9916,[405]8.9805,[406]8.9757,[407]8.9795,[408]8.9844,[409]8.9810,[410]8.9800,[411]8.9920,[412]8.9944,[413]8.9914,[414]8.9845,[415]8.9740,[416]8.9605,[417]8.9625,[418]8.9636,[419]8.9620,[420]8.9577,[421]8.9596,[422]8.9456,[423]8.9455,[424]8.9415,[425]8.9393,[426]8.9407,[427]8.9476,[428]8.9594,[429]8.9649,[430]8.9589,[431]8.9517,[432]8.9571,[433]8.9560,[434]8.9537,[435]8.9636,[436]8.9511,[437]8.9527,[438]8.9522,[439]8.9444,[440]8.9540,[441]8.9500,[442]8.9419,[443]8.9348,[444]8.9359,[445]8.9252,[446]8.9292,[447]8.9264,[448]8.9164,[449]8.9077,[450]8.9092,[451]8.9051,[452]8.8917,[453]8.8833,[454]8.8785,[455]8.8788,[456]8.8766,[457]8.8814,[458]8.8975,[459]8.8939,[460]8.8943,[461]8.8896,[462]8.8884,[463]8.8989,[464]8.8984,[465]8.9008,[466]8.9031,[467]8.9086,[468]8.9151,[469]8.9190,[470]8.9254,[471]8.9158,[472]8.9231,[473]8.9122,[474]8.9134,[475]8.9210,[476]8.9246,[477]8.9192,[478]8.9072,[479]8.9092,[480]8.9184,[481]8.9254,[482]8.9152,[483]8.9239,[484]8.9308,[485]8.9337,[486]8.9312,[487]8.9362,[488]8.9279,[489]8.9150,[490]8.9147,[491]8.9061,[492]8.9022,[493]8.8903,[494]8.8867,[495]8.8788,[496]8.8775,[497]8.8858,[498]8.8921,[499]8.8864,[500]8.8867,[501]8.8891,[502]8.8860,[503]8.9003,[504]8.9081,[505]8.9108,[506]8.9098,[507]8.9038,[508]8.9080,[509]8.9018,[510]8.9035,[511]8.9086,[512]8.9043,[513]8.9066,[514]8.9095,[515]8.9112,[516]8.9138,[517]8.9199,[518]8.9183,[519]8.9174,[520]8.9168,[521]8.9190,[522]8.9097,[523]8.9138,[524]8.9138,[525]8.9163,[526]8.9218,[527]8.9217,[528]8.9220,[529]8.9179,[530]8.9131,[531]8.9162,[532]8.9123,[533]8.9130,[534]8.9128,[535]8.9159,[536]8.9096,[537]8.9177,[538]8.9283,[539]8.9273,[540]8.9430,[541]8.9428,[542]8.9333,[543]8.9355,[544]8.9427,[545]8.9394,[546]8.9374,[547]8.9307,[548]8.9169,[549]8.9150,[550]8.9012,[551]8.8905,[552]8.8801,[553]8.8505,[554]8.8478,[555]8.8509,[556]8.8522,[557]8.8522,[558]8.8509,[559]8.8561,[560]8.8605,[561]8.8672,[562]8.8788,[563]8.8865,[564]8.8834,[565]8.8924,[566]8.8963,[567]8.8861,[568]8.8786,[569]8.8717,[570]8.8696,[571]8.8680,[572]8.8781,[573]8.8812,[574]8.8843,[575]8.8849,[576]8.8931,[577]8.8898,[578]8.8943,[579]8.9022,[580]8.9153,[581]8.9168,[582]8.9279,[583]8.9129,[584]8.9098,
Final estimate: PPL = 8.9098 +/- 0.06813

./bin/llama-perplexity -m /mnt/sda/Qwen3/30BA3B/BF16/ggml-model-IQ4_KS_R4.gguf -t 48 --numa distribute -fa -fmoe -f /mnt/sda/wikitext-2-raw/wiki.test.raw

[1]5.9157,[2]8.2473,[3]7.8220,[4]7.4939,[5]7.6001,[6]7.9028,[7]7.9229,[8]8.3807,[9]8.7390,[10]9.2328,[11]9.2688,[12]9.4720,[13]9.9081,[14]9.5297,[15]9.4301,[16]9.6723,[17]9.1437,[18]9.2876,[19]9.2351,[20]9.1926,[21]8.8901,[22]8.8694,[23]8.5100,[24]8.0669,[25]7.8465,[26]7.6365,[27]7.4379,[28]7.3253,[29]7.3554,[30]7.2922,[31]7.2456,[32]7.2775,[33]7.1748,[34]7.2446,[35]7.3580,[36]7.4783,[37]7.6387,[38]7.6871,[39]7.6826,[40]7.7400,[41]7.7437,[42]7.7172,[43]7.7729,[44]7.7782,[45]7.7727,[46]7.7862,[47]7.9634,[48]8.0546,[49]8.0397,[50]8.1127,[51]8.1546,[52]8.1880,[53]8.2591,[54]8.3306,[55]8.3336,[56]8.3690,[57]8.3474,[58]8.3826,[59]8.4488,[60]8.5003,[61]8.5228,[62]8.5709,[63]8.6296,[64]8.6880,[65]8.7701,[66]8.8408,[67]8.9141,[68]8.8999,[69]8.9137,[70]8.9056,[71]8.9346,[72]9.0009,[73]9.0406,[74]9.0589,[75]9.0118,[76]9.0099,[77]9.0545,[78]9.1086,[79]9.0397,[80]9.0167,[81]8.9818,[82]9.0207,[83]8.9864,[84]8.9676,[85]8.9822,[86]9.0764,[87]9.1087,[88]9.1009,[89]9.1035,[90]9.0748,[91]9.1235,[92]9.0961,[93]9.1397,[94]9.1471,[95]9.1229,[96]9.1070,[97]9.0786,[98]9.0899,[99]9.0680,[100]9.1258,[101]9.1523,[102]9.1380,[103]9.1546,[104]9.1212,[105]9.1191,[106]9.1065,[107]9.1362,[108]9.1735,[109]9.1996,[110]9.2495,[111]9.3670,[112]9.3603,[113]9.3166,[114]9.3823,[115]9.3907,[116]9.3535,[117]9.3360,[118]9.3124,[119]9.2678,[120]9.2798,[121]9.2613,[122]9.2524,[123]9.2108,[124]9.1560,[125]9.1253,[126]9.1008,[127]9.0431,[128]9.0121,[129]8.9749,[130]8.9426,[131]8.8919,[132]8.8538,[133]8.8386,[134]8.8341,[135]8.8314,[136]8.8288,[137]8.7982,[138]8.7644,[139]8.7701,[140]8.7539,[141]8.7449,[142]8.7508,[143]8.7591,[144]8.7873,[145]8.7586,[146]8.7182,[147]8.6803,[148]8.6430,[149]8.6225,[150]8.5876,[151]8.5578,[152]8.5438,[153]8.5466,[154]8.5019,[155]8.5038,[156]8.4624,[157]8.4358,[158]8.3998,[159]8.3661,[160]8.3297,[161]8.3102,[162]8.2952,[163]8.2750,[164]8.2737,[165]8.2536,[166]8.2465,[167]8.2389,[168]8.2632,[169]8.2637,[170]8.2948,[171]8.3196,[172]8.3720,[173]8.4160,[174]8.4348,[175]8.4909,[176]8.5192,[177]8.5657,[178]8.6056,[179]8.6168,[180]8.6162,[181]8.6378,[182]8.6611,[183]8.6444,[184]8.6584,[185]8.6671,[186]8.6704,[187]8.6805,[188]8.6811,[189]8.6943,[190]8.7242,[191]8.7321,[192]8.7467,[193]8.7411,[194]8.7661,[195]8.7860,[196]8.7964,[197]8.8032,[198]8.7818,[199]8.7756,[200]8.7642,[201]8.7727,[202]8.7853,[203]8.8055,[204]8.8221,[205]8.8382,[206]8.8279,[207]8.8534,[208]8.8351,[209]8.8367,[210]8.8337,[211]8.8349,[212]8.8342,[213]8.8331,[214]8.8146,[215]8.7975,[216]8.7917,[217]8.8014,[218]8.7987,[219]8.7711,[220]8.7409,[221]8.7295,[222]8.7180,[223]8.7132,[224]8.7283,[225]8.7069,[226]8.7080,[227]8.6983,[228]8.6732,[229]8.6435,[230]8.6209,[231]8.6055,[232]8.5902,[233]8.5887,[234]8.5967,[235]8.5912,[236]8.5777,[237]8.5647,[238]8.5448,[239]8.5360,[240]8.5391,[241]8.5406,[242]8.5511,[243]8.5503,[244]8.5682,[245]8.5703,[246]8.5890,[247]8.5938,[248]8.5976,[249]8.6049,[250]8.6124,[251]8.6300,[252]8.6456,[253]8.6730,[254]8.6916,[255]8.6963,[256]8.7104,[257]8.7213,[258]8.7070,[259]8.6856,[260]8.6644,[261]8.6397,[262]8.6227,[263]8.6159,[264]8.6173,[265]8.6274,[266]8.6306,[267]8.6298,[268]8.6206,[269]8.6228,[270]8.6200,[271]8.6101,[272]8.6093,[273]8.6062,[274]8.6018,[275]8.6015,[276]8.5897,[277]8.5881,[278]8.5929,[279]8.5900,[280]8.5854,[281]8.5786,[282]8.5859,[283]8.5588,[284]8.5263,[285]8.5317,[286]8.5161,[287]8.4981,[288]8.4946,[289]8.4978,[290]8.5203,[291]8.5238,[292]8.5232,[293]8.5233,[294]8.5408,[295]8.5598,[296]8.5742,[297]8.5971,[298]8.5933,[299]8.5790,[300]8.5806,[301]8.5773,[302]8.5733,[303]8.5657,[304]8.5827,[305]8.5823,[306]8.5789,[307]8.5809,[308]8.5787,[309]8.5770,[310]8.5815,[311]8.5862,[312]8.5759,[313]8.5682,[314]8.5717,[315]8.5585,[316]8.5634,[317]8.5847,[318]8.5897,[319]8.5844,[320]8.5872,[321]8.5734,[322]8.5844,[323]8.5991,[324]8.6155,[325]8.6356,[326]8.6381,[327]8.6301,[328]8.6303,[329]8.6153,[330]8.6061,[331]8.5973,[332]8.5999,[333]8.5999,[334]8.5921,[335]8.5812,[336]8.5756,[337]8.5821,[338]8.5913,[339]8.5839,[340]8.5758,[341]8.5642,[342]8.5618,[343]8.5566,[344]8.5652,[345]8.5691,[346]8.5649,[347]8.5530,[348]8.5540,[349]8.5472,[350]8.5414,[351]8.5411,[352]8.5458,[353]8.5457,[354]8.5328,[355]8.5492,[356]8.5573,[357]8.5633,[358]8.5514,[359]8.5523,[360]8.5497,[361]8.5543,[362]8.5495,[363]8.5458,[364]8.5554,[365]8.5725,[366]8.5979,[367]8.6146,[368]8.6439,[369]8.6631,[370]8.6778,[371]8.6986,[372]8.7211,[373]8.7290,[374]8.7387,[375]8.7588,[376]8.7726,[377]8.7831,[378]8.7964,[379]8.8068,[380]8.8250,[381]8.8401,[382]8.8543,[383]8.8666,[384]8.8798,[385]8.9075,[386]8.9248,[387]8.9267,[388]8.9279,[389]8.9358,[390]8.9591,[391]8.9796,[392]8.9769,[393]8.9740,[394]8.9672,[395]8.9675,[396]8.9766,[397]8.9813,[398]8.9831,[399]8.9883,[400]9.0036,[401]9.0072,[402]9.0072,[403]8.9972,[404]8.9879,[405]8.9777,[406]8.9726,[407]8.9762,[408]8.9813,[409]8.9775,[410]8.9758,[411]8.9877,[412]8.9897,[413]8.9867,[414]8.9799,[415]8.9693,[416]8.9558,[417]8.9578,[418]8.9586,[419]8.9569,[420]8.9527,[421]8.9549,[422]8.9407,[423]8.9403,[424]8.9365,[425]8.9341,[426]8.9360,[427]8.9429,[428]8.9544,[429]8.9601,[430]8.9540,[431]8.9469,[432]8.9523,[433]8.9512,[434]8.9489,[435]8.9594,[436]8.9466,[437]8.9483,[438]8.9478,[439]8.9395,[440]8.9488,[441]8.9452,[442]8.9370,[443]8.9305,[444]8.9315,[445]8.9208,[446]8.9245,[447]8.9220,[448]8.9125,[449]8.9034,[450]8.9049,[451]8.9017,[452]8.8887,[453]8.8803,[454]8.8753,[455]8.8755,[456]8.8730,[457]8.8779,[458]8.8941,[459]8.8906,[460]8.8910,[461]8.8865,[462]8.8853,[463]8.8961,[464]8.8953,[465]8.8979,[466]8.8999,[467]8.9054,[468]8.9119,[469]8.9154,[470]8.9217,[471]8.9120,[472]8.9195,[473]8.9084,[474]8.9099,[475]8.9171,[476]8.9205,[477]8.9152,[478]8.9034,[479]8.9054,[480]8.9142,[481]8.9213,[482]8.9111,[483]8.9197,[484]8.9266,[485]8.9294,[486]8.9270,[487]8.9319,[488]8.9236,[489]8.9105,[490]8.9095,[491]8.9012,[492]8.8975,[493]8.8860,[494]8.8824,[495]8.8746,[496]8.8731,[497]8.8814,[498]8.8876,[499]8.8821,[500]8.8825,[501]8.8850,[502]8.8818,[503]8.8957,[504]8.9033,[505]8.9059,[506]8.9049,[507]8.8988,[508]8.9034,[509]8.8972,[510]8.8991,[511]8.9044,[512]8.9000,[513]8.9028,[514]8.9058,[515]8.9072,[516]8.9096,[517]8.9156,[518]8.9138,[519]8.9129,[520]8.9122,[521]8.9142,[522]8.9048,[523]8.9088,[524]8.9086,[525]8.9113,[526]8.9169,[527]8.9171,[528]8.9176,[529]8.9134,[530]8.9088,[531]8.9120,[532]8.9079,[533]8.9081,[534]8.9082,[535]8.9116,[536]8.9054,[537]8.9138,[538]8.9247,[539]8.9238,[540]8.9393,[541]8.9391,[542]8.9295,[543]8.9317,[544]8.9385,[545]8.9352,[546]8.9336,[547]8.9269,[548]8.9135,[549]8.9116,[550]8.8978,[551]8.8873,[552]8.8769,[553]8.8476,[554]8.8448,[555]8.8473,[556]8.8489,[557]8.8486,[558]8.8473,[559]8.8520,[560]8.8566,[561]8.8632,[562]8.8751,[563]8.8830,[564]8.8798,[565]8.8891,[566]8.8933,[567]8.8835,[568]8.8757,[569]8.8693,[570]8.8674,[571]8.8657,[572]8.8757,[573]8.8785,[574]8.8814,[575]8.8817,[576]8.8893,[577]8.8860,[578]8.8904,[579]8.8981,[580]8.9112,[581]8.9128,[582]8.9239,[583]8.9090,[584]8.9062,
Final estimate: PPL = 8.9062 +/- 0.06811

./bin/llama-perplexity -m /mnt/sda/Qwen3/30BA3B/BF16/ggml-model-IQ4_KS_R4.gguf -t 48 --numa distribute -fa -fmoe -f /mnt/sda/wikitext-2-raw/wiki.test.raw -ctk q8_0 -ctv q8_0

[1]5.9016,[2]8.2000,[3]7.8140,[4]7.4725,[5]7.5629,[6]7.8727,[7]7.9035,[8]8.3263,[9]8.6767,[10]9.1767,[11]9.2245,[12]9.4083,[13]9.8413,[14]9.4664,[15]9.3727,[16]9.6102,[17]9.0849,[18]9.2388,[19]9.1735,[20]9.1432,[21]8.8470,[22]8.8168,[23]8.4654,[24]8.0293,[25]7.8182,[26]7.6125,[27]7.4136,[28]7.2990,[29]7.3292,[30]7.2632,[31]7.2147,[32]7.2425,[33]7.1391,[34]7.2118,[35]7.3291,[36]7.4505,[37]7.6066,[38]7.6595,[39]7.6571,[40]7.7174,[41]7.7206,[42]7.6952,[43]7.7506,[44]7.7565,[45]7.7513,[46]7.7658,[47]7.9396,[48]8.0275,[49]8.0138,[50]8.0853,[51]8.1307,[52]8.1643,[53]8.2358,[54]8.3060,[55]8.3053,[56]8.3430,[57]8.3225,[58]8.3581,[59]8.4309,[60]8.4806,[61]8.5032,[62]8.5485,[63]8.6048,[64]8.6630,[65]8.7482,[66]8.8175,[67]8.8924,[68]8.8787,[69]8.8915,[70]8.8836,[71]8.9148,[72]8.9826,[73]9.0235,[74]9.0416,[75]8.9940,[76]8.9918,[77]9.0371,[78]9.0880,[79]9.0183,[80]8.9964,[81]8.9587,[82]8.9944,[83]8.9610,[84]8.9402,[85]8.9564,[86]9.0486,[87]9.0821,[88]9.0735,[89]9.0747,[90]9.0454,[91]9.0955,[92]9.0694,[93]9.1150,[94]9.1229,[95]9.0985,[96]9.0840,[97]9.0566,[98]9.0688,[99]9.0485,[100]9.1061,[101]9.1346,[102]9.1200,[103]9.1360,[104]9.1011,[105]9.1018,[106]9.0895,[107]9.1188,[108]9.1568,[109]9.1825,[110]9.2341,[111]9.3521,[112]9.3452,[113]9.3028,[114]9.3675,[115]9.3770,[116]9.3409,[117]9.3241,[118]9.3013,[119]9.2564,[120]9.2678,[121]9.2488,[122]9.2393,[123]9.2002,[124]9.1469,[125]9.1159,[126]9.0924,[127]9.0347,[128]9.0034,[129]8.9666,[130]8.9353,[131]8.8850,[132]8.8451,[133]8.8305,[134]8.8248,[135]8.8234,[136]8.8209,[137]8.7910,[138]8.7567,[139]8.7631,[140]8.7471,[141]8.7386,[142]8.7445,[143]8.7522,[144]8.7807,[145]8.7505,[146]8.7112,[147]8.6732,[148]8.6360,[149]8.6164,[150]8.5807,[151]8.5516,[152]8.5375,[153]8.5405,[154]8.4962,[155]8.4992,[156]8.4582,[157]8.4322,[158]8.3958,[159]8.3613,[160]8.3245,[161]8.3042,[162]8.2882,[163]8.2678,[164]8.2657,[165]8.2456,[166]8.2368,[167]8.2301,[168]8.2538,[169]8.2544,[170]8.2859,[171]8.3102,[172]8.3630,[173]8.4064,[174]8.4245,[175]8.4801,[176]8.5084,[177]8.5554,[178]8.5952,[179]8.6057,[180]8.6065,[181]8.6286,[182]8.6513,[183]8.6357,[184]8.6486,[185]8.6579,[186]8.6620,[187]8.6730,[188]8.6738,[189]8.6871,[190]8.7167,[191]8.7243,[192]8.7397,[193]8.7343,[194]8.7577,[195]8.7769,[196]8.7888,[197]8.7953,[198]8.7731,[199]8.7664,[200]8.7559,[201]8.7647,[202]8.7775,[203]8.7980,[204]8.8137,[205]8.8302,[206]8.8196,[207]8.8455,[208]8.8264,[209]8.8284,[210]8.8242,[211]8.8247,[212]8.8245,[213]8.8242,[214]8.8063,[215]8.7894,[216]8.7850,[217]8.7952,[218]8.7927,[219]8.7639,[220]8.7339,[221]8.7223,[222]8.7116,[223]8.7064,[224]8.7211,[225]8.7003,[226]8.7008,[227]8.6912,[228]8.6655,[229]8.6352,[230]8.6134,[231]8.5975,[232]8.5829,[233]8.5819,[234]8.5907,[235]8.5855,[236]8.5715,[237]8.5586,[238]8.5385,[239]8.5304,[240]8.5332,[241]8.5353,[242]8.5460,[243]8.5450,[244]8.5628,[245]8.5646,[246]8.5836,[247]8.5875,[248]8.5915,[249]8.5992,[250]8.6072,[251]8.6246,[252]8.6396,[253]8.6664,[254]8.6849,[255]8.6886,[256]8.7028,[257]8.7138,[258]8.6980,[259]8.6768,[260]8.6553,[261]8.6303,[262]8.6140,[263]8.6072,[264]8.6084,[265]8.6184,[266]8.6213,[267]8.6203,[268]8.6109,[269]8.6128,[270]8.6113,[271]8.6013,[272]8.6002,[273]8.5974,[274]8.5942,[275]8.5939,[276]8.5820,[277]8.5805,[278]8.5852,[279]8.5820,[280]8.5774,[281]8.5707,[282]8.5787,[283]8.5512,[284]8.5188,[285]8.5241,[286]8.5089,[287]8.4909,[288]8.4869,[289]8.4895,[290]8.5114,[291]8.5149,[292]8.5140,[293]8.5142,[294]8.5316,[295]8.5506,[296]8.5642,[297]8.5877,[298]8.5841,[299]8.5692,[300]8.5706,[301]8.5680,[302]8.5639,[303]8.5569,[304]8.5745,[305]8.5746,[306]8.5707,[307]8.5734,[308]8.5716,[309]8.5707,[310]8.5753,[311]8.5795,[312]8.5694,[313]8.5621,[314]8.5663,[315]8.5533,[316]8.5577,[317]8.5790,[318]8.5843,[319]8.5790,[320]8.5820,[321]8.5679,[322]8.5800,[323]8.5938,[324]8.6105,[325]8.6308,[326]8.6332,[327]8.6251,[328]8.6260,[329]8.6108,[330]8.6016,[331]8.5926,[332]8.5956,[333]8.5957,[334]8.5876,[335]8.5771,[336]8.5711,[337]8.5770,[338]8.5861,[339]8.5789,[340]8.5703,[341]8.5586,[342]8.5562,[343]8.5509,[344]8.5587,[345]8.5623,[346]8.5576,[347]8.5456,[348]8.5460,[349]8.5389,[350]8.5331,[351]8.5330,[352]8.5379,[353]8.5370,[354]8.5242,[355]8.5401,[356]8.5479,[357]8.5531,[358]8.5408,[359]8.5419,[360]8.5397,[361]8.5451,[362]8.5403,[363]8.5366,[364]8.5463,[365]8.5628,[366]8.5886,[367]8.6058,[368]8.6349,[369]8.6541,[370]8.6687,[371]8.6894,[372]8.7124,[373]8.7204,[374]8.7299,[375]8.7500,[376]8.7637,[377]8.7737,[378]8.7876,[379]8.7978,[380]8.8164,[381]8.8318,[382]8.8456,[383]8.8580,[384]8.8710,[385]8.8986,[386]8.9157,[387]8.9177,[388]8.9189,[389]8.9269,[390]8.9499,[391]8.9701,[392]8.9670,[393]8.9642,[394]8.9573,[395]8.9575,[396]8.9663,[397]8.9706,[398]8.9717,[399]8.9767,[400]8.9924,[401]8.9957,[402]8.9955,[403]8.9853,[404]8.9763,[405]8.9653,[406]8.9603,[407]8.9638,[408]8.9684,[409]8.9650,[410]8.9634,[411]8.9757,[412]8.9778,[413]8.9744,[414]8.9673,[415]8.9566,[416]8.9433,[417]8.9448,[418]8.9459,[419]8.9444,[420]8.9400,[421]8.9419,[422]8.9279,[423]8.9277,[424]8.9237,[425]8.9214,[426]8.9228,[427]8.9294,[428]8.9414,[429]8.9471,[430]8.9413,[431]8.9345,[432]8.9398,[433]8.9388,[434]8.9367,[435]8.9467,[436]8.9339,[437]8.9353,[438]8.9351,[439]8.9267,[440]8.9365,[441]8.9333,[442]8.9257,[443]8.9191,[444]8.9199,[445]8.9096,[446]8.9130,[447]8.9104,[448]8.9008,[449]8.8919,[450]8.8932,[451]8.8895,[452]8.8761,[453]8.8675,[454]8.8629,[455]8.8631,[456]8.8609,[457]8.8660,[458]8.8819,[459]8.8785,[460]8.8785,[461]8.8742,[462]8.8731,[463]8.8837,[464]8.8831,[465]8.8852,[466]8.8872,[467]8.8925,[468]8.8989,[469]8.9025,[470]8.9087,[471]8.8997,[472]8.9071,[473]8.8964,[474]8.8979,[475]8.9053,[476]8.9089,[477]8.9035,[478]8.8919,[479]8.8941,[480]8.9030,[481]8.9105,[482]8.9003,[483]8.9091,[484]8.9163,[485]8.9196,[486]8.9168,[487]8.9217,[488]8.9133,[489]8.9002,[490]8.8997,[491]8.8910,[492]8.8872,[493]8.8755,[494]8.8719,[495]8.8641,[496]8.8628,[497]8.8714,[498]8.8780,[499]8.8725,[500]8.8728,[501]8.8752,[502]8.8721,[503]8.8863,[504]8.8940,[505]8.8969,[506]8.8959,[507]8.8901,[508]8.8949,[509]8.8884,[510]8.8903,[511]8.8956,[512]8.8914,[513]8.8939,[514]8.8972,[515]8.8990,[516]8.9014,[517]8.9072,[518]8.9054,[519]8.9049,[520]8.9039,[521]8.9059,[522]8.8965,[523]8.9003,[524]8.9001,[525]8.9028,[526]8.9086,[527]8.9088,[528]8.9095,[529]8.9053,[530]8.9005,[531]8.9036,[532]8.8999,[533]8.9003,[534]8.9000,[535]8.9031,[536]8.8967,[537]8.9052,[538]8.9158,[539]8.9149,[540]8.9305,[541]8.9301,[542]8.9205,[543]8.9225,[544]8.9292,[545]8.9262,[546]8.9243,[547]8.9178,[548]8.9041,[549]8.9023,[550]8.8888,[551]8.8781,[552]8.8676,[553]8.8381,[554]8.8353,[555]8.8383,[556]8.8396,[557]8.8397,[558]8.8384,[559]8.8437,[560]8.8485,[561]8.8554,[562]8.8673,[563]8.8752,[564]8.8724,[565]8.8813,[566]8.8853,[567]8.8753,[568]8.8676,[569]8.8609,[570]8.8589,[571]8.8573,[572]8.8675,[573]8.8705,[574]8.8733,[575]8.8736,[576]8.8814,[577]8.8783,[578]8.8829,[579]8.8908,[580]8.9042,[581]8.9057,[582]8.9168,[583]8.9019,[584]8.8990,
Final estimate: PPL = 8.8990 +/- 0.06799

ubergarm May 3, 2025

For the next release of models, I could provide quants also compatible with ik_llama.cpp if that's interesting,

This would be great!

Exciting times!

@danielhanchen fwiw myself and a few others have started adding the tag ik_llama.cpp to the iqN_k quants uploaded on the huggingface README.md model cards which makes it easier to find e.g. https://huggingface.co/models?other=ik_llama.cpp

Appreciate all your time and thoughtfulness lately with all the excitement haha... Cheers!

saood06 May 4, 2025
Collaborator

@saood06 Where did you get the RoPE factor change from?

Nowhere, I was just experimenting after I saw that statement in the model card, but I didn't get very far.

ikawrakow · 2025-05-03T06:12:51Z

ikawrakow
May 3, 2025
Maintainer Author

I have posted 3 IQ4_KS models quantized with the 3 different imatrix datasets discussed above here

3 replies

ubergarm May 4, 2025

I attempted to make a "visual diff" of three imatrix files. I didn't find yours @ikawrakow on the hf repo, so used mine, unsloths non-128k version, and bartowski's.

https://gist.github.com/ubergarm/2aa9327f7b98a9b16fef62b4941c7e76

I use @EAddario's mainline PR --show-statistics to print out a bunch of numbers and graph them as described briefly in the gist.

I'm not sure how to read the tea leaves or if this is just an amusing distraction and excuse to test vibe coding with Qwen3-30B-A3B. To make matters a bit more confusing, unsloth gave some details of their methodology which seems to include generating imatrix with larger context than the -c 512 I use (and I assume is typical and default?). Its a useful comment in an otherwise odd discussion: https://huggingface.co/unsloth/Phi-4-reasoning-plus-GGUF/discussions/1#68152ae82c118dc537ae3667

Haven't had a chance to grab PPL and KLD stats on your 3 quants yet, but might be able to get that on Sunday and update my table above.

ubergarm May 8, 2025

Just posted a quant roundup and trying to run some benchmarks against your Qwen3 IQ4_KS on hf. Already posted some PPL and KLD stats here: https://www.reddit.com/r/LocalLLaMA/comments/1khwxal/the_great_quant_wars_of_2025/

l15y May 17, 2025

Please upload the IQK-3 version, which is very useful for users with 16G VRAM.

ikawrakow · 2025-05-08T19:30:19Z

ikawrakow
May 8, 2025
Maintainer Author

@ubergarm Great write up!

The fact that the ikawrakow/IQ4_KS_Unsloth model gets a lower PPL than bf16 on your private evaluation dataset is another indication that something is not quite right.

My only comment: when there is no doubt that the bf16 model is best, then KLD and other token probability statistics against the predictions of the bf16 model are great. But when one is not sure (see above), then KLD, etc., can be misleading. Suppose the following is true:

They trained in some fp4 variant
Towards the end of the training, they decided to not show their cards just yet, and trained some more epochs in bf16
Something didn't quite work out in these last iterations
The released bf16 model ended up being crippled
By quantizing to something similar to fp4, one recovers a better quality model

In that scenario, the larger the difference between the bf16 model and the (better) quantized model, the higher the values of KLD, etc. So that, if we went by these metrics, we would be thinking that the quantized model is not good, while in reality it is better.

1 reply

saood06 May 8, 2025
Collaborator

@ubergarm Great write up!

The fact that the ikawrakow/IQ4_KS_Unsloth model gets a lower PPL than bf16 on your private evaluation dataset is another indication that something is not quite right.

Something that I'm not sure that has been mentioned in this discussion is Qwen 3 states that only 2 of the base models went through the full post-training process the rest of the models in the family are distillations. Could it be that the odd results we are seeing might only impact the distilled models (as I can't find details on how they did the distillation)?

An interesting experiment would be to see if the odd results seen with Qwen3-30B-A3B can be reproduced with Qwen3-30B-A3B-Base.

See this graphic from their blog:

afsara-ben · 2025-06-10T22:24:55Z

afsara-ben
Jun 10, 2025

@ikawrakow i am having a hard time understanding how the iqx_k quants came from? is there an explanation somewhere other than the code

1 reply

saood06 Jun 11, 2025
Collaborator

#8 has the info you are looking for.

Qwen3 quantization experiments #359

Uh oh!

ikawrakow Apr 30, 2025 Maintainer

Recipe IQK-1

Recipe IQK-2

Recipe IQK-3

Recipe IQK-4

Recipe IQK-5

Recipe IQK-6

Replies: 6 comments · 30 replies

Uh oh!

ikawrakow Apr 30, 2025 Maintainer Author

Uh oh!

saood06 Apr 30, 2025 Collaborator

Uh oh!

ikawrakow Apr 30, 2025 Maintainer Author

Uh oh!

ikawrakow Apr 30, 2025 Maintainer Author

QAT used in Qwen3 training?

Uh oh!

ikawrakow May 1, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

ikawrakow May 2, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ikawrakow May 3, 2025 Maintainer Author

Uh oh!

ikawrakow May 3, 2025 Maintainer Author

Uh oh!

saood06 May 3, 2025 Collaborator

Uh oh!

Uh oh!

saood06 May 4, 2025 Collaborator

Uh oh!

ikawrakow May 3, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ikawrakow May 8, 2025 Maintainer Author

Uh oh!

saood06 May 8, 2025 Collaborator

Uh oh!

Uh oh!

saood06 Jun 11, 2025 Collaborator

ikawrakow
Apr 30, 2025
Maintainer

Replies: 6 comments 30 replies

ikawrakow
Apr 30, 2025
Maintainer Author

saood06 Apr 30, 2025
Collaborator

ikawrakow Apr 30, 2025
Maintainer Author

ikawrakow
Apr 30, 2025
Maintainer Author

ikawrakow May 1, 2025
Maintainer Author

ikawrakow May 2, 2025
Maintainer Author

ikawrakow May 3, 2025
Maintainer Author

ikawrakow May 3, 2025
Maintainer Author

saood06 May 3, 2025
Collaborator

saood06 May 4, 2025
Collaborator

ikawrakow
May 3, 2025
Maintainer Author

ikawrakow
May 8, 2025
Maintainer Author

saood06 May 8, 2025
Collaborator

saood06 Jun 11, 2025
Collaborator