support GLM-4.5V (108B VLM) #3

ddh0 · 2025-10-14T07:26:22Z

Implementation of GLM-4.5V in llama.cpp

The architecture is Glm4vMoeForConditionalGeneration ("model_type": "glm4v_moe"). Internally, this consists of an LLM (text model) and a ViT (vision adapter / multimodal projector):

LLM (text model `glm4v_moe_text`)

Based on GLM-4.5-Air
Tensor names start with model.language_model.
Uses a "multimodal 3D RoPE" - in apply_multimodal_rotary_pos_emb, it applies rotary embeddings across temporal, height, and width dimensions for visual tokens

ViT (vision adapter `glm4v_moe`)

Adapted from apple/aimv2-huge-patch14-336:
- Architecture Aimv2VisionModel
- ~681M params
- 24 layers
- hidden_size (n_embd): 1536
- intermediate_size (n_ff): 4096
- image_size: 336
- patch_size: 14
- num_channels: 3
- depth: 24
Tensor names start with model.visual.
Its 2D positional embeddings are dynamically adapted via bicubic interpolation within the Glm4vMoeVisionEmbeddings module to handle varied image resolutions
It also applies its own rotary position embeddings within the self-attention blocks (via apply_rotary_pos_emb_vision)

Other notes:

Native context length is 65,536 (as opposed to 131,072 for GLM-4.5-Air)
RoPE theta (θ): 10,000.0 (as opposed to 100,000.0 for GLM-4.5-Air)
The model supports video input, but I currently do not plan to support video input in this PR (images only)
Tokenizer has video-related special tokens - need to handle these during conversion

References:

The 🤗 reference implementations:
- modeling_glm4v_moe.py
- modular_glm4v_moe.py
The 🤗 model card
The 🤗 config.json

* cuda : remove legacy copy-op pointer indirection code (ggml-org#16485) * remove legacy copy-op pointer indirection code * further removal of copy-op indirection code * renamed check_node_graph_compatibility_and_refresh_copy_ops function * CUDA: add fp kernel for larger batch size MoE (ggml-org#16512) * CUDA: kernel for larger batch sizes for MoE * WIP * WIP * WIP * WIP * WIP * WIP * fixup * tests * Move mmq_ids_helper to mmid * cleanup * Remove redundant checks * CUDA: use fastdiv + ggml_cuda_mad for mmvf (ggml-org#16557) * CUDA: use fastdiv + ggml_cuda_mad for mmvf * use bf16 directly + fix formatting * Add exception for HIP code * CUDA: enable FA for FP32 KV cache (ggml-org#16546) * vulkan: Improve build time for MSVC (ggml-org#16545) Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel. Enable /MP so source files are compiled in parallel. * vulkan: Support FA with K/V in F32 (ggml-org#16543) * CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (ggml-org#16577) * vulkan: Add ACC_TYPE_VEC2 implementation (ggml-org#16203) Signed-off-by: Stefan Savic <[email protected]> Co-authored-by: Stefan Savic <[email protected]> * metal : avoid using Metal's gpuAddress property (ggml-org#16576) * metal : avoid using Metal's gpuAddress property * metal : fix rope kernels buffer check --------- Signed-off-by: Stefan Savic <[email protected]> Co-authored-by: Anav Prasad <[email protected]> Co-authored-by: Aman Gupta <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]> Co-authored-by: Jeff Bolz <[email protected]> Co-authored-by: SavicStefan <[email protected]> Co-authored-by: Stefan Savic <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

ddh0 · 2025-10-15T19:11:22Z

Moved to ggml-org#16600

initial commit for branch glm45v

36955c3

github-actions bot added the python label Oct 14, 2025

ddh0 marked this pull request as draft October 14, 2025 07:38

github-actions bot added Apple Metal Nvidia GPU Vulkan testing ggml OpenCL labels Oct 14, 2025

ddh0 added 3 commits October 14, 2025 16:47

use F32 accumulators for GLM4V_MOE

63f9f6c

update notes

65603e1

add arch

7eaefc3

ddh0 changed the title ~~support GLM-4.5V (108B multimodal)~~ support GLM-4.5V (108B VLM) Oct 14, 2025

ddh0 added 3 commits October 14, 2025 17:10

llama-model : add placeholders

9132806

fix arch name for tensor names

f88780f

oops

128d850

ddh0 closed this Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support GLM-4.5V (108B VLM) #3

support GLM-4.5V (108B VLM) #3

Uh oh!

ddh0 commented Oct 14, 2025 •

edited

Loading

Uh oh!

ddh0 commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

support GLM-4.5V (108B VLM) #3

support GLM-4.5V (108B VLM) #3

Uh oh!

Conversation

ddh0 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation of GLM-4.5V in llama.cpp

LLM (text model glm4v_moe_text)

ViT (vision adapter glm4v_moe)

Other notes:

References:

See also:

Uh oh!

ddh0 commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ddh0 commented Oct 14, 2025 •

edited

Loading

LLM (text model `glm4v_moe_text`)

ViT (vision adapter `glm4v_moe`)