Replies: 3 comments 10 replies
-
People do similar things a lot by making scripts that leveraging gguf-py. (Some notable examples was updating the gemma QAT to use use Q6_K instead of fp16 for the embeddings table, manually making deepseek R1-T chimera from a V3 and R1 GGUF, etc.). I've thought to add support to the C/C++ code to do this, but it seems unnecessary given how flexible gguf-py is. There has been effort made to keep gguf-py current with all the quant types (see #458 and #298). |
Beta Was this translation helpful? Give feedback.
-
It would be useful, right? When I'm actively experimenting with quantization mixes I wish I had this feature. But implementing it basically means to re-implement quantization, so I have not done it. The alternative is to run a second quantization where only the tensors that you want to change are quantized (using |
Beta Was this translation helpful? Give feedback.
-
Have you seen this: https://github.com/Thireus/GGUF-Tool-Suite? I haven't fully gone through the code yet, but I think it seems to accomplish at least some of the goals you described here (taking the path of using the gguf-split system). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey,
Could it be possible to have a partial requant feature?
For (a generic) example, one quantizes a IQ2_KT .gguf, but with ffn_down in IQ2_S and the output in IQ5_KS_R4.
Then, one wants to requantize the same model with the same IQ2_KT broad quant strategy, but with ffn_down in IQ3_XXS and the output in IQ5_K.
Could a feature be implemented so the first quantized model is used as a secondary source to the original source, in order import all the already quantized tensors in IQ2_KT from this secondary source, copy them in the destination .gguf, and only requantize from the original source those tensors which the type has been changed in the quantization command?
That could save a lot of time and compute during tests.
Beta Was this translation helpful? Give feedback.
All reactions