Make a few GLM tensors not required (to support GLM 4.6) #16359
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
layer.nextn.shared_head_head and layer.nextn.embed_tokens are both excluded from GLM 4.6 resulting in the model not loading after conversion/quantization, this marks those tensors as not required which makes it work
Before this change I got
llama_model_load: error loading model: missing tensor 'blk.92.nextn.embed_tokens.weight'
With this change the model loads and runs successfully
Thanks compilade for pointing in the right direction
Workaround for conversion should anyone else come across this:
convert_hf_to_gguf.py currently only looks at model*.safetensor files
if you rename
mtp.safetensors
tomodel-mtp.safetensors
it will work properlyMake sure to read the contributing guidelines before submitting a PR