Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -5146,10 +5146,7 @@ def set_vocab(self):
def set_gguf_parameters(self):
super().set_gguf_parameters()
hparams = self.hparams
if hparams.get("head_dim"):
rope_dim = hparams["head_dim"]
else:
rope_dim = hparams["hidden_size"] // hparams["num_attention_heads"]
rope_dim = hparams.get("head_dim") or hparams["hidden_size"] // hparams["num_attention_heads"]

self.gguf_writer.add_rope_dimension_count(rope_dim)
self.gguf_writer.add_rope_scaling_type(gguf.RopeScalingType.NONE)
Expand All @@ -5175,7 +5172,7 @@ def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iter
n_head = self.hparams["num_attention_heads"]
n_kv_head = self.hparams.get("num_key_value_heads")
n_embd = self.hparams["hidden_size"]
head_dim = self.hparams.get("head_dim", n_embd // n_head)
head_dim = self.hparams.get("head_dim") or n_embd // n_head
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sometimes we have issue where models exported by tranformers has some keys set to None, will discuss with @huggingface team to see if it can be removed in next version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CISC I checked with transformers team, they said that the None value is actually set by a custom code outside of the library.

More importantly, I almost forgot that we actually have Model.find_hparams in convert_hf_to_gguf.py that is perfect for handling such case. If you have time, can you do a pass to change places that currently using .get to find_hparams?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method is useful if you have multiple candidates for a value, but I don't see how it applies here?

The issue is not None, but that they set an actual 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok I misunderstood the function. But now I think it would be nice if find_hparam can handle this case, maybe add an arg default_value to it?

I'm thinking about this because it was also the case for gemma 3, is was a bit messy because some params are either missing or being null in the config.json, could be possible that many models in the future will have this same behavior.


output_name = self.format_tensor_name(gguf.MODEL_TENSOR.OUTPUT)

Expand Down