How to fine tune quantized modeling convert it to GGUF still been quantized?

Hi guys
I've succeeded to
* fine tune (train) a model from huggingface
* fuse the model with adapters
* convert result with `llama.cpp` to `gguf` to be using with ollama

This works fine.
The only point here is that the source model is a quantized one. And according to the example I've followed the `fuse` step uses `mlx_lm.fuse  --de-quantize` option. So the final model is a pretty huge in size.


I was trying to omit dequantization. But in this case `llama.cpp  convert_hf_to_gguf.py` convert step fails with 
```
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00003.safetensors'
Traceback (most recent call last):
  File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 8595, in <module>
    main()
  File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 8589, in main
    model_instance.write()
  File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 410, in write
    self.prepare_tensors()
  File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 277, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
  File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 4969, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
  File "/workspace/llama.cpp/convert_hf_to_gguf.py", line 236, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.embed_tokens.biases'
```

So how actually do the same but to keep the model being quantized?

Or should instead of keeping the source model being quantized just the result huge gguf model be quantized over for that purposes?


Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to fine tune quantized modeling convert it to GGUF still been quantized? #1382

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to fine tune quantized modeling convert it to GGUF still been quantized? #1382

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions