feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

trufae · 2024-10-22T16:53:10Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

trufae · 2024-10-31T17:36:24Z

ping

wooooyeahhhh · 2024-10-31T18:25:15Z

Wouldn't this have a negative effect on output quality compared to converting to f16 then using the quantise program? Because the output and embedding tensors would be converted to q4_0/q4_1 and I don't think that the quantize program produces a pure quant.

compilade · 2024-10-31T19:11:54Z

Related to #9022.

Basically, the q4_0 and q4_1 options of llama-quantize also use q4_k and q6_k for the token embeddings and output tensors, and those types are not yet supported by the Python re-implementation in gguf-py/gguf/quants.py, partly because it would be slow, but mostly because the k-quants rounding is not platform-independent (because of different rounding depending of whether or not FMA was used).

But for quantization types smaller than Q8_0, there's also a lot of heuristics in llama_tensor_get_type to "choose" the type of each tensor, which is more complicated than the current type selection logic of convert_hf_to_gguf.py, (which fortunately gives exactly the same selections for {F32, F16, BF16, Q8_0}, but not other types).

Ideally convert_hf_to_gguf.py should produce the exact same model files as llama-quantize (which it does for F32, F16, BF16, and Q8_0) to reduce confusion, but as explained above, it's more complicated for smaller types without changing the existing mixtures produced by llama-quantize.

Eventually, the k-quants rounding will be platform-independent and k-quantization will be implemented in gguf-py/gguf/quants.py, and then direct conversion to Q4_0, Q4_1, Q5_0, and Q5_1 could be added to convert_hf_to_gguf.py, but the type selection heuristics for smaller quants would need to be ported to the convert scripts too.

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications

a279f17

github-actions bot added the python python script changes label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

Uh oh!

trufae commented Oct 22, 2024 •

edited

Loading

Uh oh!

trufae commented Oct 31, 2024

Uh oh!

wooooyeahhhh commented Oct 31, 2024

Uh oh!

compilade commented Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

Are you sure you want to change the base?

feat(convert_hf_to_gguf): support q4_0 and q4_1 quantifications #10008

Uh oh!

Conversation

trufae commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trufae commented Oct 31, 2024

Uh oh!

wooooyeahhhh commented Oct 31, 2024

Uh oh!

compilade commented Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

trufae commented Oct 22, 2024 •

edited

Loading