convert: clean Gemma vision/audio chat template markers #16749

pockers21 · 2025-10-24T05:59:55Z

Summary

Normalize Gemma chat templates at conversion time: replace <start_of_image>/<end_of_image> (and audio
equivalents) with the MTMD placeholder .

Context

Discovered via CI: the Server matrix intermittently failed the vision chat test (tools/server/tests/
unit/test_vision_api.py::test_vision_chat_completion) with empty content due to image injection not
happening.
Root cause is template marker mismatch (model emits <start_of_image> while llama.cpp expects
for MTMD insertion), not a CI infra problem.

Changes

M convert_hf_to_gguf.py
- Gemma2/Gemma3 set_vocab(): read tokenizer.chat_template and clean:
  - <start_of_image> → , <end_of_image> → ""
  - <start_of_audio> → , <end_of_audio> → ""
- If changed, write back with gguf_writer.add_chat_template(cleaned)

Testing

Build:
- cd /root/llama.cpp
- cmake -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON
- cmake --build build -j --target llama-server
Install tests:
- python3 -m venv .venv_server_tests && source .venv_server_tests/bin/activate
- pip install -r tools/server/tests/requirements.txt
External server (example port 18081):
- export LLAMA_CACHE=/root/autodl-tmp/llama-cache
- ./build/bin/llama-server --host 127.0.0.1 --port 18081 --temp 0.8 --seed 42 --hf-repo ggml-org/
  tinygemma3-GGUF --hf-file tinygemma3-Q8_0.gguf --batch-size 32 --no-slots --alias tinygemma3 --ctx-
  size 1024 --parallel 2 --n-predict 4 --mmproj-url https://huggingface.co/ggml-org/tinygemma3-GGUF/
  resolve/main/mmproj-tinygemma3.gguf
- DEBUG_EXTERNAL=1 PORT=18081 LLAMA_CACHE=$LLAMA_CACHE pytest -q -x tools/server/tests/unit/
  test_vision_api.py::test_vision_chat_completion -k 'IMG_URL_0 or IMG_BASE64_URI_0'
Expected: passes for both parameters.

Impact

Only affects models whose chat_template uses the above vision/audio markers; no change for other
models.
Keeps server runtime clean and model-agnostic; does not alter public inference APIs.

ngxson · 2025-10-24T21:29:40Z

Discovered via CI: the Server matrix intermittently failed the vision chat test (tools/server/tests/
unit/test_vision_api.py::test_vision_chat_completion) with empty content due to image injection not
happening.

Root cause is template marker mismatch (model emits <start_of_image> while llama.cpp expects
for MTMD insertion), not a CI infra problem.

What? When does the test fail? I can't see it fail in our CI.

Then how to you explain the "intermittently failed" part in your comment above? If this is really the problem with chat template, it should always fail, not intermittently

Your PR looks like hallucinated AI-generated content. Please explicitly state if you use AI to generate parts of this PR.

pockers21 · 2025-10-27T02:19:56Z

Discovered via CI: the Server matrix intermittently failed the vision chat test (tools/server/tests/
unit/test_vision_api.py::test_vision_chat_completion) with empty content due to image injection not
happening.

Root cause is template marker mismatch (model emits <start_of_image> while llama.cpp expects
for MTMD insertion), not a CI infra problem.

What? When does the test fail? I can't see it fail in our CI.

Then how to you explain the "intermittently failed" part in your comment above? If this is really the problem with chat template, it should always fail, not intermittently

Your PR looks like hallucinated AI-generated content. Please explicitly state if you use AI to generate parts of this PR.

This PR was created as a draft. I assumed you wouldn’t receive review notifications for draft PRs, and I did not intend to request a review.

The error does exist:
https://github.com/ggml-org/llama.cpp/actions/runs/18767468550/job/53545798260

It appears to be caused by the newly introduced logic, but after reviewing the code I still haven’t figured out why it would affect the server flow.

I called it intermittent because so far it seems to occur only on my side, and it should be unrelated to my changes.

The reason I opened it here is that the server CI pipeline only runs when a PR is converted to a regular (non-draft) PR. I’ve kept it as a draft to avoid notifying you and disrupting your work. I’m still trying to reproduce the server regex-matching error locally, and I will convert this to a regular PR only after I’ve fully resolved it.

Apologies again.

pockers21 · 2025-10-27T07:09:42Z

Discovered via CI: the Server matrix intermittently failed the vision chat test (tools/server/tests/
unit/test_vision_api.py::test_vision_chat_completion) with empty content due to image injection not
happening.

Root cause is template marker mismatch (model emits <start_of_image> while llama.cpp expects
for MTMD insertion), not a CI infra problem.

What? When does the test fail? I can't see it fail in our CI.

Then how to you explain the "intermittently failed" part in your comment above? If this is really the problem with chat template, it should always fail, not intermittently

Your PR looks like hallucinated AI-generated content. Please explicitly state if you use AI to generate parts of this PR.

I reproduced the issue locally and confirmed it was introduced by the changes in jina PR, not by the master branch. I’m closing this PR.

liyang added 3 commits October 23, 2025 17:10

server(chat): inject image_url via MTMD marker

ea44482

ci: retrigger workflows

65d1ee8

gguf: clean Gemma vision/audio markers to <media>

c469e8a

github-actions bot added the python python script changes label Oct 24, 2025

pockers21 force-pushed the bugfix-server-vision-mtmd branch 7 times, most recently from 5e2fa90 to 86d2de5 Compare October 24, 2025 08:16

Merge branch 'master' into bugfix-server-vision-mtmd

5fb33e3

pockers21 force-pushed the bugfix-server-vision-mtmd branch from 86d2de5 to 5fb33e3 Compare October 24, 2025 10:45

pockers21 closed this Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert: clean Gemma vision/audio chat template markers #16749

convert: clean Gemma vision/audio chat template markers #16749

Uh oh!

pockers21 commented Oct 24, 2025 •

edited

Loading

Uh oh!

ngxson commented Oct 24, 2025 •

edited

Loading

Uh oh!

pockers21 commented Oct 27, 2025 •

edited

Loading

Uh oh!

pockers21 commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

convert: clean Gemma vision/audio chat template markers #16749

convert: clean Gemma vision/audio chat template markers #16749

Uh oh!

Conversation

pockers21 commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pockers21 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pockers21 commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pockers21 commented Oct 24, 2025 •

edited

Loading

ngxson commented Oct 24, 2025 •

edited

Loading

pockers21 commented Oct 27, 2025 •

edited

Loading