fix: correct chat format usage #1790

Ki-Seki · 2025-12-03T07:53:50Z

Background

In our unit tests, we widely use the model M4-ai/TinyMistral-248M-v2-Instruct-GGUF as the LlamaCpp test model.

However, because this model was released early, its GGUF metadata does not contain a chat_template:

Missing chat template: https://huggingface.co/M4-ai/TinyMistral-248M-v2-Instruct-GGUF/tree/main?show_file_info=TinyMistral-248M-v2-Instruct.Q2_K.gguf
Example of a modern model with a chat template:
https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/tree/main?show_file_info=Qwen3-0.6B-Q8_0.gguf

Despite this, the model is actually a chat model, and according to its documentation we should apply the correct chat template when performing chat completions:

Recommended prompt template:
https://huggingface.co/M4-ai/TinyMistral-248M-v2-Instruct-GGUF#recommended-prompt-template
Note: The model uses a Qwen-style chat template

However, Llama.cpp defaults to the llama-2 chat format:

The correct format for this model in Llama.cpp is actually qwen:

https://github.com/abetlen/llama-cpp-python/blob/c37132b/llama_cpp/llama_chat_format.py#L1038-L1052

Therefore, the model should be loaded with:

from llama_cpp import Llama
from outlines import from_llamacpp

llamacpp_model = Llama.from_pretrained(
    repo_id="M4-ai/TinyMistral-248M-v2-Instruct-GGUF",
    filename="TinyMistral-248M-v2-Instruct.Q4_K_M.gguf",
+    chat_format="qwen",
)

model = from_llamacpp(llamacpp_model)

Why this issue surfaces now

Previously, we did not emphasize chat completion behavior, so this problem did not appear in tests.
Under the new best-effort chat completion strategy, the incorrect chat template becomes visible.

For example, the unit test, pytest tests/models/test_llamacpp.py::test_llamacpp_json fails immediately if the wrong chat template is applied. As you can see, the model's behavior will be erratic due to an incorrect template:

Ki-Seki · 2025-12-03T07:57:49Z

Hi Robin, while working on Issue #1784 / PR #1789 , I encountered a minor issue. To keep PR #1789 focused, I’m submitting this small separate PR for your review. Thx~ 🤗@RobinPicard

RobinPicard

Great catch, thanks a lot!

Ki-Seki · 2025-12-03T13:37:07Z

You’re welcome, my friend! It's a pleasure to contribute. ✌️

RobinPicard · 2025-12-04T09:34:40Z

If you have some free time I'd love to have a chat with you about your use of Outlines. I sent you a connection request on LinkedIn

Ki-Seki added 2 commits December 3, 2025 15:45

fix: correct chat format usage

07720bc

Merge branch 'main' into fix/use-qwen-instead

5613e4e

RobinPicard approved these changes Dec 3, 2025

View reviewed changes

RobinPicard merged commit 2fa3777 into dottxt-ai:main Dec 3, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: correct chat format usage #1790

fix: correct chat format usage #1790

Uh oh!

Ki-Seki commented Dec 3, 2025

Uh oh!

Ki-Seki commented Dec 3, 2025

Uh oh!

RobinPicard left a comment

Uh oh!

Uh oh!

Ki-Seki commented Dec 3, 2025

Uh oh!

RobinPicard commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: correct chat format usage #1790

fix: correct chat format usage #1790

Uh oh!

Conversation

Ki-Seki commented Dec 3, 2025

Background

Why this issue surfaces now

Uh oh!

Ki-Seki commented Dec 3, 2025

Uh oh!

RobinPicard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ki-Seki commented Dec 3, 2025

Uh oh!

RobinPicard commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants