Understanding how the chat_template is used #3520

timwirt · 2025-02-27T07:04:34Z

timwirt
Feb 27, 2025

I'm trying to understand how exactly the chat_template is used, especially with GGUF models. My understanding is that the template is tried to be read from the .gguf on first load and stored in the model3.json. This is also what is used later and where the changes you make in the GUI end up. This is how it works with the gpt4all models and the hugging face models. Can anyone confirm this?

onestardao · 2025-07-29T09:12:22Z

onestardao
Jul 29, 2025

You're mostly on point — the .gguf files include metadata like chat_template, which gets cached into model3.json (or equivalent) on initial load. That cached copy is what the interface actually uses, unless you flush/reload manually.

If you're working with models across HuggingFace / gpt4all, just beware some edge-case inconsistencies in tokenization or template injection order (e.g., pre vs. post system prompt insertions), especially when you apply prompt structure tweaks.

We ran into this hard while building a "drunk mode" reasoning layer —
even slight shifts in template placement were affecting latent tree alignment.
So if you're experimenting with custom behavior, double-check whether the template logic executes before or after system prompt mutation.

preparing for GPT-5 benchmark
patrolling GitHub

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding how the chat_template is used #3520

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Understanding how the chat_template is used #3520

Uh oh!

timwirt Feb 27, 2025

Replies: 1 comment

Uh oh!

onestardao Jul 29, 2025

timwirt
Feb 27, 2025

onestardao
Jul 29, 2025