mtmd-cli : allow using --jinja #16718

ngxson · 2025-10-22T12:47:17Z

This change is extracted from #16701

In order to have jinja support, I needed to add chat_history to mtmd_cli_context

Tested and confirmed to work with Gemma 3

ggerganov · 2025-10-22T12:50:30Z

tools/mtmd/mtmd-cli.cpp

                ctx.n_past = 0;
-                llama_memory_seq_rm(llama_get_memory(ctx.lctx), 0, 1, -1); // keep BOS
+                ctx.reset_chat_history();
+                llama_memory_clear(llama_get_memory(ctx.lctx), true);


This will no longer keep the BOS - just making sure it is intended.

Yes it's expected, as the BOS token will always be added along with the first formatted message

ngxson · 2025-10-22T14:09:34Z

Btw @ggerganov , I'm currently having 2 small models failed in the test, using master branch:

[vision] FAIL: llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K_M
[vision] OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
[vision] OK:   llama-mtmd-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL2_5-1B-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/LFM2-VL-450M-GGUF:Q8_0
[vision] FAIL: llama-mtmd-cli ggml-org/granite-docling-258M-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/ultravox-v0_5-llama-3_2-1b-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M

Just wondering if there are any recent changes in Metal backend that could affect this?

ggerganov · 2025-10-22T14:26:31Z

It's possible - can you pass me a command with one of the failures to look into it?

ngxson · 2025-10-22T14:59:16Z

I used this command to run all tests: ./tools/mtmd/tests.sh

But you can also run one test manually:

llama-mtmd-cli -hf ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0 --image ./tools/mtmd/test-1.jpeg -p "what is the publisher name of the newspaper?" --temp 0 -n 128

The answer should contain the world "New York"

Optionally, you can also let llama-mtmd-cli printing all intermediate results using export MTMD_DEBUG_GRAPH=1

ngxson · 2025-10-22T15:06:19Z

Hmm after bisect seems like the problem comes from #16206 , no idea why my tests was OK back then. I'll investigate more on this and will let you know.

ngxson · 2025-10-23T13:00:29Z

Merging this because the error is unrelated to the current PR

ggerganov · 2025-10-23T13:09:35Z

The test also fails with CPU-only build, so it should not be related to the Metal backend.

* mtmd-cli : allow using --jinja * support -sys * implement chat_history * fix clear memory * rm -sys support, added TODO

ngxson added 4 commits October 22, 2025 14:14

mtmd-cli : allow using --jinja

4b42937

support -sys

47d893c

implement chat_history

dfb84f6

fix clear memory

f568981

ngxson requested a review from ggerganov as a code owner October 22, 2025 12:47

ggerganov approved these changes Oct 22, 2025

View reviewed changes

rm -sys support, added TODO

283f785

github-actions bot added the examples label Oct 22, 2025

ngxson merged commit d0660f2 into ggml-org:master Oct 23, 2025
15 of 68 checks passed

FMayran pushed a commit to FMayran/llama.cpp that referenced this pull request Oct 23, 2025

mtmd-cli : allow using --jinja (ggml-org#16718)

3f9da66

* mtmd-cli : allow using --jinja * support -sys * implement chat_history * fix clear memory * rm -sys support, added TODO

pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025

mtmd-cli : allow using --jinja (ggml-org#16718)

bcb8ed4

* mtmd-cli : allow using --jinja * support -sys * implement chat_history * fix clear memory * rm -sys support, added TODO

ngxson mentioned this pull request Oct 27, 2025

mtmd : fix idefics3 preprocessing #16806

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtmd-cli : allow using --jinja #16718

mtmd-cli : allow using --jinja #16718

Uh oh!

ngxson commented Oct 22, 2025 •

edited

Loading

Uh oh!

ggerganov Oct 22, 2025

Uh oh!

ngxson Oct 22, 2025

Uh oh!

ngxson commented Oct 22, 2025

Uh oh!

ggerganov commented Oct 22, 2025

Uh oh!

ngxson commented Oct 22, 2025

Uh oh!

ngxson commented Oct 22, 2025

Uh oh!

ngxson commented Oct 23, 2025

Uh oh!

Uh oh!

ggerganov commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mtmd-cli : allow using --jinja #16718

mtmd-cli : allow using --jinja #16718

Uh oh!

Conversation

ngxson commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Oct 22, 2025

Uh oh!

ggerganov commented Oct 22, 2025

Uh oh!

ngxson commented Oct 22, 2025

Uh oh!

ngxson commented Oct 22, 2025

Uh oh!

ngxson commented Oct 23, 2025

Uh oh!

Uh oh!

ggerganov commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Oct 22, 2025 •

edited

Loading