Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Oct 22, 2025

This change is extracted from #16701

In order to have jinja support, I needed to add chat_history to mtmd_cli_context

Tested and confirmed to work with Gemma 3

@ngxson ngxson requested a review from ggerganov as a code owner October 22, 2025 12:47
ctx.n_past = 0;
llama_memory_seq_rm(llama_get_memory(ctx.lctx), 0, 1, -1); // keep BOS
ctx.reset_chat_history();
llama_memory_clear(llama_get_memory(ctx.lctx), true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will no longer keep the BOS - just making sure it is intended.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's expected, as the BOS token will always be added along with the first formatted message

@ngxson
Copy link
Collaborator Author

ngxson commented Oct 22, 2025

Btw @ggerganov , I'm currently having 2 small models failed in the test, using master branch:

[vision] FAIL: llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K_M
[vision] OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
[vision] OK:   llama-mtmd-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL2_5-1B-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/LFM2-VL-450M-GGUF:Q8_0
[vision] FAIL: llama-mtmd-cli ggml-org/granite-docling-258M-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/ultravox-v0_5-llama-3_2-1b-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M

Just wondering if there are any recent changes in Metal backend that could affect this?

@ggerganov
Copy link
Member

It's possible - can you pass me a command with one of the failures to look into it?

@ngxson
Copy link
Collaborator Author

ngxson commented Oct 22, 2025

I used this command to run all tests: ./tools/mtmd/tests.sh

But you can also run one test manually:

llama-mtmd-cli -hf ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0 --image ./tools/mtmd/test-1.jpeg -p "what is the publisher name of the newspaper?" --temp 0 -n 128

The answer should contain the world "New York"

Optionally, you can also let llama-mtmd-cli printing all intermediate results using export MTMD_DEBUG_GRAPH=1

@ngxson
Copy link
Collaborator Author

ngxson commented Oct 22, 2025

Hmm after bisect seems like the problem comes from #16206 , no idea why my tests was OK back then. I'll investigate more on this and will let you know.

@ngxson
Copy link
Collaborator Author

ngxson commented Oct 23, 2025

Merging this because the error is unrelated to the current PR

@ngxson ngxson merged commit d0660f2 into ggml-org:master Oct 23, 2025
15 of 68 checks passed
@ggerganov
Copy link
Member

The test also fails with CPU-only build, so it should not be related to the Metal backend.

FMayran pushed a commit to FMayran/llama.cpp that referenced this pull request Oct 23, 2025
* mtmd-cli : allow using --jinja

* support -sys

* implement chat_history

* fix clear memory

* rm -sys support, added TODO
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
* mtmd-cli : allow using --jinja

* support -sys

* implement chat_history

* fix clear memory

* rm -sys support, added TODO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants