Skip to content

Commit 36955c3

Browse files
committed
initial commit for branch glm45v
1 parent bc07349 commit 36955c3

File tree

2 files changed

+30
-1
lines changed

2 files changed

+30
-1
lines changed

convert_hf_to_gguf.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9219,6 +9219,35 @@ def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iter
92199219

92209220
return [] # skip other tensors
92219221

9222+
@ModelBase.register("Glm4vMoeForConditionalGeneration")
9223+
class GLM4V_MoE(MmprojModel):
9224+
#
9225+
# the HF model's type is `glm4v_moe`. internally, it consists of two models:
9226+
# - `glm4v_moe_text`
9227+
# + main text model
9228+
# + tensor names start with "model.language_model."
9229+
# + "2D-RoPE" (aKa Roformer) w/ embeddings dynamically adapted via bicubic interpolation
9230+
# - `glm4v_moe`
9231+
# + vision adapter (ViT)
9232+
# + tensor names start with "model.visual."
9233+
# + "3D-RoPE" (without the interpolation mentioned above)
9234+
#
9235+
# other notable quirks include:
9236+
# - has MTP layer (need to keep these tensors - same as GLM-4.5-Air)
9237+
# - RoPE theta value (θ): use 10k rather than 100k for GLM-4.5-Air
9238+
# - the model's vision supports video input, but this is not implemented here
9239+
#
9240+
# for more info, refer to:
9241+
# - reference impl : https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4v_moe
9242+
# - HF model card : https://huggingface.co/zai-org/GLM-4.5V
9243+
# - arXiv paper (model) : https://arxiv.org/abs/2507.01006
9244+
# - arXiv paper (orig. ViT) : https://arxiv.org/abs/2411.14402
9245+
#
9246+
# TODO: the model's tokenizer has video-related special tokens - deal with these (??)
9247+
#
9248+
pass
9249+
9250+
92229251
###### CONVERSION LOGIC ######
92239252

92249253

src/llama-arch.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ enum llm_arch {
6969
LLM_ARCH_CHATGLM,
7070
LLM_ARCH_GLM4,
7171
LLM_ARCH_GLM4_MOE,
72+
LLM_ARCH_GLM4V_MOE,
7273
LLM_ARCH_BITNET,
7374
LLM_ARCH_T5,
7475
LLM_ARCH_T5ENCODER,
@@ -122,7 +123,6 @@ enum llm_kv {
122123
LLM_KV_GENERAL_LICENSE,
123124
LLM_KV_GENERAL_SOURCE_URL,
124125
LLM_KV_GENERAL_SOURCE_HF_REPO,
125-
126126
LLM_KV_VOCAB_SIZE,
127127
LLM_KV_CONTEXT_LENGTH,
128128
LLM_KV_EMBEDDING_LENGTH,

0 commit comments

Comments
 (0)