-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Labels
Description
Name and Version
All versions post b5125 in all OSes will be affected by this bug.
Operating systems
Other? (Please let us know in description)
Which llama.cpp modules do you know to be affected?
llama-quantize
Command line
./llama-quantize --tensor-type attn=q4_k gorilla-falcon-7b-hf-v0-F16.gguf gorilla-falcon-7b-hf-v0-Q4_K_M-kaboom.gguf q4_k_m 10Problem description & steps to reproduce
When using --tensor-type to override a tensor that would have otherwise been quantised in fallback mode, the GGML_ASSERT(tensor->ne[0] % blck_size == 0 && "tensor row size not divisible by block size of new type") error will be triggered.
Problem occurs here because the logic ignores the tensor type was reassigned to a fallback due to its geometry not being an exact multiple of the GGML block_size.
Steps to reproduce:
- Attempt to quantise a model where any of the tensors is not an exact multiple of GGML block_size, whilst at the same time using
--tensor-typeto override that tensor (see Command line example above) - Quantisation fails with an assert error.
PR #14995 fixes this.
Credit to @ddh0 for flagging this bug.
First Bad Commit
Relevant log output
ddh0