Skip to content

Conversation

trufae
Copy link

@trufae trufae commented Oct 22, 2024

@github-actions github-actions bot added the python python script changes label Oct 22, 2024
@trufae
Copy link
Author

trufae commented Oct 31, 2024

ping

@wooooyeahhhh
Copy link

Wouldn't this have a negative effect on output quality compared to converting to f16 then using the quantise program? Because the output and embedding tensors would be converted to q4_0/q4_1 and I don't think that the quantize program produces a pure quant.

@compilade
Copy link
Collaborator

Related to #9022.

Basically, the q4_0 and q4_1 options of llama-quantize also use q4_k and q6_k for the token embeddings and output tensors, and those types are not yet supported by the Python re-implementation in gguf-py/gguf/quants.py, partly because it would be slow, but mostly because the k-quants rounding is not platform-independent (because of different rounding depending of whether or not FMA was used).

But for quantization types smaller than Q8_0, there's also a lot of heuristics in llama_tensor_get_type to "choose" the type of each tensor, which is more complicated than the current type selection logic of convert_hf_to_gguf.py, (which fortunately gives exactly the same selections for {F32, F16, BF16, Q8_0}, but not other types).

Ideally convert_hf_to_gguf.py should produce the exact same model files as llama-quantize (which it does for F32, F16, BF16, and Q8_0) to reduce confusion, but as explained above, it's more complicated for smaller types without changing the existing mixtures produced by llama-quantize.

Eventually, the k-quants rounding will be platform-independent and k-quantization will be implemented in gguf-py/gguf/quants.py, and then direct conversion to Q4_0, Q4_1, Q5_0, and Q5_1 could be added to convert_hf_to_gguf.py, but the type selection heuristics for smaller quants would need to be ported to the convert scripts too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants