Replies: 1 comment
-
You're mostly on point — the .gguf files include metadata like chat_template, which gets cached into model3.json (or equivalent) on initial load. That cached copy is what the interface actually uses, unless you flush/reload manually. If you're working with models across HuggingFace / gpt4all, just beware some edge-case inconsistencies in tokenization or template injection order (e.g., pre vs. post system prompt insertions), especially when you apply prompt structure tweaks. We ran into this hard while building a "drunk mode" reasoning layer — preparing for GPT-5 benchmark |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to understand how exactly the chat_template is used, especially with GGUF models. My understanding is that the template is tried to be read from the .gguf on first load and stored in the model3.json. This is also what is used later and where the changes you make in the GUI end up. This is how it works with the gpt4all models and the hugging face models. Can anyone confirm this?
Beta Was this translation helpful? Give feedback.
All reactions