-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
Hi guys
I've succeeded to
- fine tune (train) a model from huggingface
- fuse the model with adapters
- convert result with
llama.cpptoggufto be using with ollama
This works fine.
The only point here is that the source model is a quantized one. And according to the example I've followed the fuse step uses mlx_lm.fuse --de-quantize option. So the final model is a pretty huge in size.
I was trying to omit dequantization. But in this case llama.cpp convert_hf_to_gguf.py convert step fails with
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00003.safetensors'
Traceback (most recent call last):
File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 8595, in <module>
main()
File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 8589, in main
model_instance.write()
File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 410, in write
self.prepare_tensors()
File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 277, in prepare_tensors
for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 4969, in modify_tensors
return [(self.map_tensor_name(name), data_torch)]
File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 236, in map_tensor_name
raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.embed_tokens.biases'
So how actually do the same but to keep the model being quantized?
Or should instead of keeping the source model being quantized just the result huge gguf model be quantized over for that purposes?
Thank you.
Metadata
Metadata
Assignees
Labels
No labels