Skip to content

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Aug 20, 2025

Fix #14318

Support https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking-2506 (and newer "Thinking" variants) with dynamic resolution

GGUF: https://huggingface.co/ggml-org/Kimi-VL-A3B-Thinking-2506-GGUF

Large part of code is copied from LFM2 implementation, huge kudos to @tdakhran for the LFM2 code 😄


NOTE: Kimi-VL-A3B-Instruct generates gibberish output even in text-only mode - I have no idea why, if someone knows, please comment

@github-actions github-actions bot added examples python python script changes labels Aug 20, 2025
@ngxson

This comment was marked as outdated.

cur = ggml_permute(ctx0, cur, 0, 2, 1, 3);

cur = ggml_cont_2d(ctx0, cur, cur->ne[0], cur->ne[1] * cur->ne[2]);
cur = build_pixel_shuffle(cur, scale_factor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the HF checkpoint, the block is called PixelUnshuffleBlock. I followed that when I added a comment above // pixel unshuffle block. The function name is build_pixel_shuffle though.

Should it be shuffle or unshuffle? :) I'm personally fine with any as soon as we are consistent.

Copy link
Collaborator Author

@ngxson ngxson Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In smolvlm (siglip1) it's called "shuffle", siglip2 is "unshuffle" and here in kimi-vl it's called "merge_patches"

I think the more appropriate name can be merge_patches_permute since it's rely on permutation, to differentiate with patch merging using pool_avg_2d (used by gemma 3)

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 25, 2025

It should work now. No idea why Kimi-VL-A3B-Instruct outputs gibberish even with text-only input, so I removed the GGUF

[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K_M
[vision] OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
[vision] OK:   llama-mtmd-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL2_5-1B-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/LFM2-VL-450M-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/ultravox-v0_5-llama-3_2-1b-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/pixtral-12b-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Mistral-Small-3.1-24B-Instruct-2503-GGUF
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2-VL-7B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-8B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-14B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-7B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Kimi-VL-A3B-Thinking-2506-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/ultravox-v0_5-llama-3_1-8b-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-7B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-72B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Llama-4-Scout-17B-16E-Instruct-GGUF:IQ1_S

GGML_ASSERT(pos_embd);

if (!pos_embd || height * width == pos_embd->ne[1]) {
if (height == n_per_side && width == n_per_side) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdakhran FYI, I attempted to fix a potential bug here. For example, if the input shape is [n_embd, 256] and h = 8, w = 32, then 8 * 32 == 256 which skip ggml_interpolate even when it needs to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ngxson , I missed this logic.

@ngxson ngxson requested review from CISC and ggerganov August 25, 2025 14:26
Copy link
Contributor

@tdakhran tdakhran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the new models and the fixes to existing ones, @ngxson!

@ngxson ngxson merged commit 79a5462 into ggml-org:master Aug 26, 2025
51 of 52 checks passed
@foldl
Copy link
Contributor

foldl commented Aug 27, 2025

I would like to report that Kimi-VL-A3B-Instruct does not generate gibberish outputs in my implementation:

image

Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 27, 2025
* convert : fix tensor naming conflict for llama 4 vision

* convert ok

* support kimi vision model

* clean up

* fix style

* fix calc number of output tokens

* refactor resize_position_embeddings

* add test case

* rename build fn

* correct a small bug
@ngxson
Copy link
Collaborator Author

ngxson commented Aug 27, 2025

@foldl feel free to open a PR to fix it

@iamlemec
Copy link
Collaborator

iamlemec commented Sep 1, 2025

I can also report that Kimi-VL-A3B-Instruct is working for me with the current llama.cpp master. Nothing fancy, just the usual GGUF conversion script. Works for various quantizations on CUDA. Might be worth another shot @ngxson?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add support for moonshotai/Kimi-VL-A3B-Instruct

6 participants