common : use common_chat_templates for add_bos and add_eos #15326

danbev · 2025-08-14T18:09:20Z

This commit updates common_chat_templates_apply_jinja to use the
the add_bos and add_eos parameters from the chat template instead of
the inputs.

The motivation for this is that currently if the add_bos and add_eos
from the input parameters are used it is possible to there will be a
missmatch between the model and the chat template which can lead to the
the removal of duplicate BOS/EOS tokens in chat.cpp apply to not
happen leading to two BOS tokens being added to the template.

I've tried this using new converted models and the bos duplication is not there. If this solution is accepted then I'll re-convert the instruction tuned models and upload them to ggml-org.

CISC · 2025-08-14T18:51:08Z

You're saying double BOS is being added to the instruction tuned model, but only without jinja?

I can't verify the model config as it's gated, though looking at f.ex. MLX versions it seems BOS is <bos>

So that doesn't make much sense, the gemma chat template doesn't have BOS, and SPM has add_bos set by default, meaning there should be only one BOS being added. For jinja there's a BOS in the chat template, but as long as add_bos is true this should be automatically removed.

ggerganov · 2025-08-15T05:14:54Z

Here is my understanding:

Without this change, if you run IT model without --jinja it uses single BOS (correct). But if you add --jinja it will have 2 BOS (wrong)
With this change, if you run the IT model without --jinja it uses zero BOS (wrong) and with --jinja it uses 1 BOS (correct)

Maybe I don't understand the logic completely, but this seems very confusing. I can't tell when --jinja should be used and when it should not. Can we improve this somehow?

danbev · 2025-08-15T05:30:31Z

Sorry about the confusion, it was late yesterday and I was a little rushed creating this PR. I've not looked at this part of the code base much, but I'll take a closer look today and try to understand this issue better.

ggerganov · 2025-08-15T05:41:39Z

For jinja there's a BOS in the chat template, but as long as add_bos is true this should be automatically removed.

@CISC Maybe this is the root of the problem - I'm pretty sure that when I tested yesterday with --jinja and without the patch from this PR, the second BOS was not removed. Will double check now to confirm.

ggerganov · 2025-08-15T05:50:36Z

Here is a repro using master:

$ huggingface-cli download google/gemma-3-270m-it --local-dir google/gemma-3-270m-it

$ python3 convert_hf_to_gguf.py google/gemma-3-270m-it/ --outfile ./models/gemma-3-270m-it/ggml-model-bf16.gguf --outtype bf16

$ ./bin/llama-cli -m ../models/gemma-3-270m-it/ggml-model-bf16.gguf -c 0 -fa --jinja -p "Test" --verbose-prompt

...

0.00.118.683 I llama_model_loader: - kv  33:            tokenizer.ggml.padding_token_id u32              = 0
0.00.118.684 I llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
0.00.118.684 I llama_model_loader: - kv  35:               tokenizer.ggml.add_sep_token bool             = false
0.00.118.686 I llama_model_loader: - kv  36:               tokenizer.ggml.add_eos_token bool             = false

...

0.00.354.187 I 
0.00.354.385 W tokenize: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
0.00.354.389 I main: prompt: 'Test'
0.00.354.389 I main: number of tokens in prompt = 11
0.00.354.390 I      2 -> '<bos>'
0.00.354.390 I      2 -> '<bos>'
0.00.354.390 I    105 -> '<start_of_turn>'
0.00.354.392 I   2364 -> 'user'
0.00.354.392 I    107 -> '
'
0.00.354.394 I   3694 -> 'Test'
0.00.354.394 I    106 -> '<end_of_turn>'
0.00.354.394 I    107 -> '
'
0.00.354.394 I    105 -> '<start_of_turn>'
0.00.354.395 I   4368 -> 'model'
0.00.354.395 I    107 -> '
'
0.00.354.395 I 
0.00.354.397 I main: interactive mode on.
0.00.354.410 I sampler seed: 3041241033

...

In this case add_bos == true and the Jinja template has BOS, which results in 2 BOSes with the --jinja flag.

The llama-server ... --jinja behaves the same way:

0.11.182.122 D ubatch_print:   token     = [
0.11.182.123 D ubatch_print:     0: id =      2 (           <bos>), pos =    0, n_seq_id =  1, seq_id = [0], output = 0
0.11.182.124 D ubatch_print:     1: id =      2 (           <bos>), pos =    1, n_seq_id =  1, seq_id = [0], output = 0
0.11.182.126 D ubatch_print:     2: id =    105 ( <start_of_turn>), pos =    2, n_seq_id =  1, seq_id = [0], output = 0
0.11.182.127 D ubatch_print:     3: id =   2364 (            user), pos =    3, n_seq_id =  1, seq_id = [0], output = 0

danbev · 2025-08-15T05:58:48Z

I noticed that the instruction tuned model has the following:

(venv) $ head ~/work/ai/models/gemma-3-270m-it/tokenizer_config.json
{
  "add_bos_token": true,
  "add_eos_token": false,
  "added_tokens_decoder": {
   ...

The pretrained/base model also add_bos_token set to true which I think is correct, but I don't think this should be true for the instruction tuned model?

CISC · 2025-08-15T06:51:45Z

...
0.00.118.684 I llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
...
0.00.354.390 I      2 -> '<bos>'
0.00.354.390 I      2 -> '<bos>'
0.00.354.390 I    105 -> '<start_of_turn>'
...

Ok, that should not happen, it should have been removed here:

llama.cpp/common/chat.cpp

Lines 791 to 800 in f75b830

    
           // To avoid double BOS / EOS tokens, we're manually removing begining / trailing tokens 
        
           // instead of using `chat_template_options.use_bos_token = false`, since these tokens 
        
           // may be needed inside the template / between messages too. 
        
           auto result = tmpl.apply(tmpl_inputs, tmpl_opts); 
        
           if (inputs.add_bos && string_starts_with(result, tmpl.bos_token())) { 
        
               result = result.substr(tmpl.bos_token().size()); 
        
           } 
        
           if (inputs.add_eos && string_ends_with(result, tmpl.eos_token())) { 
        
               result = result.substr(0, result.size() - tmpl.eos_token().size()); 
        
           }

CISC · 2025-08-15T06:56:07Z

The pretrained/base model also add_bos_token set to true which I think is correct, but I don't think this should be true for the instruction tuned model?

It should, the problem is just that for some reason it's not automatically removed from the chat template (which technically is the wrong approach, we really should disable add_bos/add_eos when using jinja chat templates instead).

CISC · 2025-08-15T07:40:13Z

Ah, wait, is perhaps the problem that the token is tokenized with a prepended space?

llama.cpp/common/chat.cpp

Lines 574 to 585 in f75b830

    
           const auto get_token = [&](llama_token token, const char * name, const char * jinja_variable_name) { 
        
               if (token == LLAMA_TOKEN_NULL) { 
        
                   if (default_template_src.find(jinja_variable_name) != std::string::npos 
        
                       || template_tool_use_src.find(jinja_variable_name) != std::string::npos) { 
        
                       LOG_WRN("common_chat_templates_init: warning: vocab does not have a %s token, jinja template won't work as intended.\n", name); 
        
                   } 
        
                   return std::string(); 
        
               } 
        
               return common_token_to_piece(vocab, token, true); 
        
           }; 
        
           token_bos = get_token(llama_vocab_bos(vocab), "BOS", "bos_token"); 
        
           token_eos = get_token(llama_vocab_eos(vocab), "EOS", "eos_token");

Edit: Nope, add_space_prefix is false.

danbev · 2025-08-15T07:46:27Z

Ok, that should not happen, it should have been removed here:

This does not seem to happen for all templates, for example in

llama.cpp/tools/main/main.cpp

Line 258 in 4227c9b

const bool add_bos = llama_vocab_get_add_bos(vocab) && !params.use_jinja;

This will be false (assuming that we are not using the workaround in this PR, I reverted it locally). And then later we have:

llama.cpp/tools/main/main.cpp

Lines 293 to 299 in 4227c9b

    
           if (!params.system_prompt.empty() || !params.prompt.empty()) { 
        
               common_chat_templates_inputs inputs; 
        
               inputs.use_jinja = g_params->use_jinja; 
        
               inputs.messages = chat_msgs; 
        
               inputs.add_generation_prompt = !params.prompt.empty(); 
        
               prompt = common_chat_templates_apply(chat_templates.get(), inputs).prompt;

But this is not setting in the add_bos on inputs so it will be false. Perhaps this should be:

diff --git a/tools/main/main.cpp b/tools/main/main.cpp
index dc776f59e..04379201e 100644
--- a/tools/main/main.cpp
+++ b/tools/main/main.cpp
@@ -255,7 +255,7 @@ int main(int argc, char ** argv) {
         }
     }
 
-    const bool add_bos = llama_vocab_get_add_bos(vocab) && !params.use_jinja;
+    const bool add_bos = llama_vocab_get_add_bos(vocab);
     if (!llama_model_has_encoder(model)) {
         GGML_ASSERT(!llama_vocab_get_add_eos(vocab));
     }
@@ -294,6 +294,7 @@ int main(int argc, char ** argv) {
                 common_chat_templates_inputs inputs;
                 inputs.use_jinja = g_params->use_jinja;
                 inputs.messages = chat_msgs;
+                inputs.add_bos = add_bos;
                 inputs.add_generation_prompt = !params.prompt.empty();
 
                 prompt = common_chat_templates_apply(chat_templates.get(), inputs).prompt;

CISC · 2025-08-15T07:54:20Z

@danbev Yep, you are right, I overlooked this codepath.

CISC · 2025-08-15T07:58:39Z

@danbev Mind adding a new PR after testing? Don't forget to pass add_eos too.

CISC · 2025-08-15T08:02:39Z

It might need fixing elsewhere too: https://github.com/search?q=repo%3Aggml-org%2Fllama.cpp%20common_chat_templates_apply&type=code

CISC · 2025-08-15T09:06:22Z

@danbev Actually, looking at this more closely I think I made a mistake in #15086 common_chat_templates_apply_jinja shouldn't get those params from inputs, but from tmpls.

Edit: Sorry for the back-and-forth, but I think this is the only change that needs to be done, the cause of the problem is here:

llama.cpp/common/chat.cpp

Lines 2064 to 2065 in f75b830

    
           params.add_bos = inputs.add_bos; 
        
           params.add_eos = inputs.add_eos;

danbev · 2025-08-15T10:06:36Z

Edit: Sorry for the back-and-forth, but I think this is the only change that needs to be done, the cause of the problem is here:

No worries at all! Sounds good, I'll try that, thanks!

This commit updates common_chat_templates_apply_jinja to use the the add_bos and add_eos parameters from the chat template instead of the inputs. The motivation for this is that currently if the `add_bos` and `add_eos` from the input parameters are used it is possible to there will be a missmatch between the model and the chat template which can lead to the the removal of duplicate BOS/EOS tokens in chat.cpp `apply` to not happen leading to two BOS tokens being added to the template.

CISC

Thanks, everything works now with/without jinja?

danbev · 2025-08-15T12:39:33Z

Thanks, everything works now with/without jinja?

Yes, I think this looks good now:

llama-cli and llama-server outputs

llama-cli with --jinja:

(venv) $ build/bin/llama-cli -m models/gemma-3-270m-it.gguf -c 0 -fa --jinja -p "Test" --verbose-prompt
...
main: prompt: 'Test'
main: number of tokens in prompt = 10
     2 -> '<bos>'
   105 -> '<start_of_turn>'
  2364 -> 'user'
   107 -> '
'
  3694 -> 'Test'
   106 -> '<end_of_turn>'
   107 -> '
'
   105 -> '<start_of_turn>'
  4368 -> 'model'
   107 -> '

And llama-cli without --jinja:

(venv) $ build/bin/llama-cli -m models/gemma-3-270m-it.gguf -c 0 -fa -p "Test" --verbose-prompt
...
main: prompt: 'Test'
main: number of tokens in prompt = 10
     2 -> '<bos>'
   105 -> '<start_of_turn>'
  2364 -> 'user'
   107 -> '
'
  3694 -> 'Test'
   106 -> '<end_of_turn>'
   107 -> '
'
   105 -> '<start_of_turn>'
  4368 -> 'model'
   107 -> '
'

And llama-server with --jinja:

(venv) $ build/bin/llama-server -m models/gemma-3-270m-it.gguf -c 0 -fa --verbose-prompt -t 1 --threads-http 1
...
main: chat template, chat_template: {{ bos_token }}
{%- if messages[0]['role'] == 'system' -%}
    {%- if messages[0]['content'] is string -%}
        {%- set first_user_prefix = messages[0]['content'] + '

' -%}
    {%- else -%}
        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '

' -%}
    {%- endif -%}
    {%- set loop_messages = messages[1:] -%}
{%- else -%}
    {%- set first_user_prefix = "" -%}
    {%- set loop_messages = messages -%}
{%- endif -%}
{%- for message in loop_messages -%}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
    {%- endif -%}
    {%- if (message['role'] == 'assistant') -%}
        {%- set role = "model" -%}
    {%- else -%}
        {%- set role = message['role'] -%}
    {%- endif -%}
    {{ '<start_of_turn>' + role + '
' + (first_user_prefix if loop.first else "") }}
    {%- if message['content'] is string -%}
        {{ message['content'] | trim }}
    {%- elif message['content'] is iterable -%}
        {%- for item in message['content'] -%}
            {%- if item['type'] == 'image' -%}
                {{ '<start_of_image>' }}
            {%- elif item['type'] == 'text' -%}
                {{ item['text'] | trim }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{ raise_exception("Invalid content type") }}
    {%- endif -%}
    {{ '<end_of_turn>
' }}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{'<start_of_turn>model
'}}
{%- endif -%}
, example_format: '<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
How are you?<end_of_turn>
<start_of_turn>model
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Content-only
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 23
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 23, n_tokens = 23, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 23, n_tokens = 23
slot update_slots: id  0 | task 0 | SWA checkpoint create, pos_min = 0, pos_max = 22, size = 0.338 MiB, total = 1/3 (0.338 MiB)
slot      release: id  0 | task 0 | stop processing: n_past = 32, truncated = 0
slot print_timing: id  0 | task 0 |
prompt eval time =      97.37 ms /    23 tokens (    4.23 ms per token,   236.21 tokens per second)
       eval time =     275.07 ms /    10 tokens (   27.51 ms per token,    36.35 tokens per second)
      total time =     372.44 ms /    33 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200

And llama-server without --jinja:

(venv) $ build/bin/llama-server -m models/gemma-3-270m-it.gguf -c 0 -fa --verbose-prompt -t 1 --threads-http 1
...
main: chat template, chat_template: {{ bos_token }}
{%- if messages[0]['role'] == 'system' -%}
    {%- if messages[0]['content'] is string -%}
        {%- set first_user_prefix = messages[0]['content'] + '

' -%}
    {%- else -%}
        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '

' -%}
    {%- endif -%}
    {%- set loop_messages = messages[1:] -%}
{%- else -%}
    {%- set first_user_prefix = "" -%}
    {%- set loop_messages = messages -%}
{%- endif -%}
{%- for message in loop_messages -%}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
    {%- endif -%}
    {%- if (message['role'] == 'assistant') -%}
        {%- set role = "model" -%}
    {%- else -%}
        {%- set role = message['role'] -%}
    {%- endif -%}
    {{ '<start_of_turn>' + role + '
' + (first_user_prefix if loop.first else "") }}
    {%- if message['content'] is string -%}
        {{ message['content'] | trim }}
    {%- elif message['content'] is iterable -%}
        {%- for item in message['content'] -%}
            {%- if item['type'] == 'image' -%}
                {{ '<start_of_image>' }}
            {%- elif item['type'] == 'text' -%}
                {{ item['text'] | trim }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{ raise_exception("Invalid content type") }}
    {%- endif -%}
    {{ '<end_of_turn>
' }}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{'<start_of_turn>model
'}}
{%- endif -%}
, example_format: '<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
How are you?<end_of_turn>
<start_of_turn>model
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle

srv  log_server_r: request: GET / 127.0.0.1 200
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  params_from_: Chat format: Content-only
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 23
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 23, n_tokens = 23, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 23, n_tokens = 23
slot update_slots: id  0 | task 0 | SWA checkpoint create, pos_min = 0, pos_max = 22, size = 0.338 MiB, total = 1/3 (0.338 MiB)
slot      release: id  0 | task 0 | stop processing: n_past = 32, truncated = 0
slot print_timing: id  0 | task 0 |
prompt eval time =     108.48 ms /    23 tokens (    4.72 ms per token,   212.03 tokens per second)
       eval time =     326.19 ms /    10 tokens (   32.62 ms per token,    30.66 tokens per second)
      total time =     434.66 ms /    33 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200

Let me know if there is anything else I should test to verify this.

CISC · 2025-08-15T12:43:11Z

Thanks, everything works now with/without jinja?

Yes, I think this looks good now:

Perfect, thanks again! :)

broadbit-hu · 2025-08-15T18:04:55Z

Already tested with b6152 release and Mistral NeMo Instruct:

./build/bin/llama-cli -m ../models/mistral-nemo-instruct-2407-q8_0.gguf -c 0 -fa --jinja -p "Test" --verbose-prompt

Results:

check_double_bos_eos: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
main: prompt: 'Test'
main: number of tokens in prompt = 5
     1 -> '<s>'
     1 -> '<s>'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

After the patch:

main: prompt: 'Test'
main: number of tokens in prompt = 4
     1 -> '<s>'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

danbev · 2025-08-15T19:25:27Z

@broadbit-hu Could you give this a try with b6178? This is the release that contains this code of this PR.

broadbit-hu · 2025-08-15T19:48:33Z

@broadbit-hu Could you give this a try with b6178? This is the release that contains this code of this PR.

It's perfect (tested with Mistral NeMo), thanks for the fix! :)

main: number of tokens in prompt = 4
     1 -> '<s>'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

Something is wrong with Mistral Small yet:

./build/bin/llama-cli -m ../models/mistral-small-24b-instruct-2506-q4_k_m.gguf -c 0 -fa --jinja -p "Test" --verbose-prompt

print_info: model params     = 23.57 B
print_info: general.name     = Mistral Small 3.2 24B Instruct 2506
print_info: vocab type       = BPE
print_info: n_vocab          = 131072
print_info: n_merges         = 269443
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: PAD token        = 11 '<pad>'
print_info: LF token         = 1010 'Ċ'
print_info: EOG token        = 2 '</s>'

...

Failed to infer a tool call example (possible template bug)
main: llama threadpool init, n_threads = 6
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
main: chat template example:
mistral-v7-tekken

main: prompt: 'Test'
main: number of tokens in prompt = 8
     1 -> '<s>'
 39575 -> 'mist'
  2784 -> 'ral'
  9332 -> '-v'
  1055 -> '7'
  1045 -> '-'
 17848 -> 'tek'
  3569 -> 'ken'

broadbit-hu · 2025-08-15T19:55:45Z

Tested with the specified jinja template:

./build/bin/llama-cli -m ../models/mistral-small-24b-instruct-2506-q4_k_m.gguf -c 0 -fa --jinja --chat-template-file models/templates/Mistral-Small-3.2-24B-Instruct-2506.jinja -p "Test" --verbose-prompt

The results (see the last tokens below):

main: prompt: 'Test'
main: number of tokens in prompt = 507
     1 -> '<s>'
    17 -> '[SYSTEM_PROMPT]'
  4568 -> 'You'
  1584 -> ' are'
 42301 -> ' Mist'
  2784 -> 'ral'
 29121 -> ' Small'
  1032 -> ' '
  1051 -> '3'
  1044 -> ','
  1261 -> ' a'
 43520 -> ' Large'
 26242 -> ' Language'
 11512 -> ' Model'
  1319 -> ' ('
 23947 -> 'LL'
  1077 -> 'M'
  1041 -> ')'
  6254 -> ' created'
  1536 -> ' by'
 42301 -> ' Mist'
  2784 -> 'ral'
 26554 -> ' AI'
  1044 -> ','
  1261 -> ' a'
  8689 -> ' French'
 53862 -> ' startup'
  3518 -> ' head'
125609 -> 'quartered'
  1294 -> ' in'
  6993 -> ' Paris'
  1626 -> '.
'
 16994 -> 'Your'
  7807 -> ' knowledge'
  4469 -> ' base'
  1486 -> ' was'
  3804 -> ' last'
 12220 -> ' updated'
  1408 -> ' on'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1051 -> '3'
  1045 -> '-'
  1049 -> '1'
  1048 -> '0'
  1045 -> '-'
  1048 -> '0'
  1049 -> '1'
  1046 -> '.'
  1531 -> ' The'
  3519 -> ' current'
  5451 -> ' date'
  1395 -> ' is'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1053 -> '5'
  1045 -> '-'
  1048 -> '0'
  1056 -> '8'
  1045 -> '-'
  1049 -> '1'
  1053 -> '5'
  1338 -> '.

'
  7651 -> 'When'
  1636 -> ' you'
  6185 -> ''re'
  1605 -> ' not'
  5257 -> ' sure'
  2314 -> ' about'
  2269 -> ' some'
  3686 -> ' information'
  1505 -> ' or'
  2200 -> ' when'
  1278 -> ' the'
  3330 -> ' user'
  1681 -> ''s'
  4546 -> ' request'
 10867 -> ' requires'
  2015 -> ' up'
  6793 -> '-to'
 43546 -> '-date'
  1505 -> ' or'
  4811 -> ' specific'
  2181 -> ' data'
  1044 -> ','
  1636 -> ' you'
  4016 -> ' must'
  2210 -> ' use'
  1278 -> ' the'
  5178 -> ' available'
 12589 -> ' tools'
  1317 -> ' to'
 15273 -> ' fetch'
  1278 -> ' the'
  3686 -> ' information'
  1046 -> '.'
  5469 -> ' Do'
  1605 -> ' not'
 89786 -> ' hesitate'
  1317 -> ' to'
  2210 -> ' use'
 12589 -> ' tools'
 26119 -> ' whenever'
  2127 -> ' they'
  1710 -> ' can'
  5234 -> ' provide'
  1261 -> ' a'
  2081 -> ' more'
 18501 -> ' accurate'
  1505 -> ' or'
  7662 -> ' complete'
  4005 -> ' response'
  1046 -> '.'
  3367 -> ' If'
  1836 -> ' no'
 11157 -> ' relevant'
 12589 -> ' tools'
  1584 -> ' are'
  5178 -> ' available'
  1044 -> ','
  2430 -> ' then'
 11904 -> ' clearly'
  3468 -> ' state'
  1455 -> ' that'
  1636 -> ' you'
  2607 -> ' don'
  2405 -> ''t'
  1736 -> ' have'
  1278 -> ' the'
  3686 -> ' information'
  1321 -> ' and'
 10035 -> ' avoid'
  6187 -> ' making'
  2015 -> ' up'
  7211 -> ' anything'
  1338 -> '.

'
  5475 -> 'If'
  1278 -> ' the'
  3330 -> ' user'
  1681 -> ''s'
  4098 -> ' question'
  1395 -> ' is'
  1605 -> ' not'
  6133 -> ' clear'
  1044 -> ','
 61103 -> ' ambiguous'
  1044 -> ','
  1505 -> ' or'
  3120 -> ' does'
  1605 -> ' not'
  5234 -> ' provide'
  6171 -> ' enough'
  5315 -> ' context'
  1394 -> ' for'
  1636 -> ' you'
  1317 -> ' to'
 32181 -> ' accurately'
  4832 -> ' answer'
  1278 -> ' the'
  4098 -> ' question'
  1044 -> ','
  1636 -> ' you'
  1653 -> ' do'
  1605 -> ' not'
  3352 -> ' try'
  1317 -> ' to'
  4832 -> ' answer'
  1494 -> ' it'
  3169 -> ' right'
  5109 -> ' away'
  1321 -> ' and'
  1636 -> ' you'
  6153 -> ' rather'
  4237 -> ' ask'
  1278 -> ' the'
  3330 -> ' user'
  1317 -> ' to'
 38695 -> ' clarify'
  2034 -> ' their'
  4546 -> ' request'
  1319 -> ' ('
  1101 -> 'e'
  3596 -> '.g'
  1046 -> '.'
  1429 -> ' "'
  7493 -> 'What'
  1584 -> ' are'
  2269 -> ' some'
  3683 -> ' good'
 40378 -> ' restaurants'
  3879 -> ' around'
  1639 -> ' me'
 10555 -> '?"'
  2297 -> ' =>'
  1429 -> ' "'
 17507 -> 'Where'
  1584 -> ' are'
  1636 -> ' you'
 10555 -> '?"'
  1505 -> ' or'
  1429 -> ' "'
  7651 -> 'When'
  1395 -> ' is'
  1278 -> ' the'
  4275 -> ' next'
 18034 -> ' flight'
  1317 -> ' to'
 23286 -> ' Tokyo'
  1034 -> '"'
  2297 -> ' =>'
  1429 -> ' "'
 17507 -> 'Where'
  1653 -> ' do'
  1636 -> ' you'
 10601 -> ' travel'
  1562 -> ' from'
 10555 -> '?"'
  4342 -> ').
'
  4568 -> 'You'
  1584 -> ' are'
  5282 -> ' always'
  3435 -> ' very'
 41132 -> ' attent'
  1556 -> 'ive'
  1317 -> ' to'
 18814 -> ' dates'
  1044 -> ','
  1321 -> ' and'
  2200 -> ' when'
  6136 -> ' asked'
  2314 -> ' about'
  3686 -> ' information'
  1513 -> ' at'
  4811 -> ' specific'
 18814 -> ' dates'
  1044 -> ','
  1636 -> ' you'
 89782 -> ' discard'
  3686 -> ' information'
  1455 -> ' that'
  1395 -> ' is'
  1513 -> ' at'
  3866 -> ' another'
  5451 -> ' date'
  1626 -> '.
'
  4568 -> 'You'
  2685 -> ' follow'
  2576 -> ' these'
 15776 -> ' instructions'
  1294 -> ' in'
  1747 -> ' all'
 18085 -> ' languages'
  1044 -> ','
  1321 -> ' and'
  5282 -> ' always'
  9148 -> ' respond'
  1317 -> ' to'
  1278 -> ' the'
  3330 -> ' user'
  1294 -> ' in'
  1278 -> ' the'
  7278 -> ' language'
  2127 -> ' they'
  2210 -> ' use'
  1505 -> ' or'
  4546 -> ' request'
  1626 -> '.
'
 12961 -> 'Next'
 14275 -> ' sections'
 12293 -> ' describe'
  1278 -> ' the'
 28946 -> ' capabilities'
  1455 -> ' that'
  1636 -> ' you'
  1736 -> ' have'
  1338 -> '.

'
  1035 -> '#'
  1488 -> ' W'
 34112 -> 'EB'
  1398 -> ' B'
  4755 -> 'RO'
 20266 -> 'WS'
  9774 -> 'ING'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  6560 -> ' cannot'
  3142 -> ' perform'
  2258 -> ' any'
  7430 -> ' web'
  6123 -> ' search'
  1505 -> ' or'
  4731 -> ' access'
 18259 -> ' internet'
  1317 -> ' to'
  3432 -> ' open'
 76064 -> ' URLs'
  1044 -> ','
 14440 -> ' links'
  6704 -> ' etc'
  1046 -> '.'
  3367 -> ' If'
  1494 -> ' it'
  7444 -> ' seems'
  2479 -> ' like'
  1278 -> ' the'
  3330 -> ' user'
  1395 -> ' is'
 39322 -> ' expecting'
  1636 -> ' you'
  1317 -> ' to'
  1653 -> ' do'
  1878 -> ' so'
  1044 -> ','
  1636 -> ' you'
 38695 -> ' clarify'
  1278 -> ' the'
  8516 -> ' situation'
  1321 -> ' and'
  4237 -> ' ask'
  1278 -> ' the'
  3330 -> ' user'
  1317 -> ' to'
  9441 -> ' copy'
 31944 -> ' paste'
  1278 -> ' the'
  3403 -> ' text'
  7655 -> ' directly'
  1294 -> ' in'
  1278 -> ' the'
 21666 -> ' chat'
  1338 -> '.

'
  1035 -> '#'
  1373 -> ' M'
 15373 -> 'ULT'
  1073 -> 'I'
  5036 -> '-M'
  7460 -> 'OD'
  4286 -> 'AL'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  1736 -> ' have'
  1278 -> ' the'
  8727 -> ' ability'
  1317 -> ' to'
  3346 -> ' read'
  8061 -> ' images'
  1044 -> ','
  1809 -> ' but'
  1636 -> ' you'
  6560 -> ' cannot'
 10616 -> ' generate'
  8061 -> ' images'
  1046 -> '.'
  3213 -> ' You'
  2095 -> ' also'
  6560 -> ' cannot'
  2148 -> ' trans'
 13089 -> 'cribe'
 16023 -> ' audio'
  7309 -> ' files'
  1505 -> ' or'
 26612 -> ' videos'
  1626 -> '.
'
  4568 -> 'You'
  6560 -> ' cannot'
  3346 -> ' read'
  6685 -> ' nor'
  2148 -> ' trans'
 13089 -> 'cribe'
 16023 -> ' audio'
  7309 -> ' files'
  1505 -> ' or'
 26612 -> ' videos'
  1338 -> '.

'
  1035 -> '#'
 18580 -> ' TO'
  8568 -> 'OL'
 58135 -> ' CALL'
  9774 -> 'ING'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  2188 -> ' may'
  1736 -> ' have'
  4731 -> ' access'
  1317 -> ' to'
 12589 -> ' tools'
  1455 -> ' that'
  1636 -> ' you'
  1710 -> ' can'
  2210 -> ' use'
  1317 -> ' to'
 15273 -> ' fetch'
  3686 -> ' information'
  1505 -> ' or'
  3142 -> ' perform'
 10636 -> ' actions'
  1046 -> '.'
  3213 -> ' You'
  4016 -> ' must'
  2210 -> ' use'
  2576 -> ' these'
 12589 -> ' tools'
  1294 -> ' in'
  1278 -> ' the'
  3629 -> ' following'
 19599 -> ' situations'
  2100 -> ':

'
  1049 -> '1'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 10867 -> ' requires'
  2015 -> ' up'
  6793 -> '-to'
 43546 -> '-date'
  3686 -> ' information'
  1626 -> '.
'
  1050 -> '2'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 10867 -> ' requires'
  4811 -> ' specific'
  2181 -> ' data'
  1455 -> ' that'
  1636 -> ' you'
  1653 -> ' do'
  1605 -> ' not'
  1736 -> ' have'
  1294 -> ' in'
  2143 -> ' your'
  7807 -> ' knowledge'
  4469 -> ' base'
  1626 -> '.
'
  1051 -> '3'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 19263 -> ' involves'
 10636 -> ' actions'
  1455 -> ' that'
  1636 -> ' you'
  6560 -> ' cannot'
  3142 -> ' perform'
  3816 -> ' without'
 12589 -> ' tools'
  1338 -> '.

'
 82158 -> 'Always'
 54628 -> ' priorit'
  2033 -> 'ize'
  2505 -> ' using'
 12589 -> ' tools'
  1317 -> ' to'
  5234 -> ' provide'
  1278 -> ' the'
  2725 -> ' most'
 18501 -> ' accurate'
  1321 -> ' and'
 20351 -> ' helpful'
  4005 -> ' response'
  1046 -> '.'
  3367 -> ' If'
 12589 -> ' tools'
  1584 -> ' are'
  1605 -> ' not'
  5178 -> ' available'
  1044 -> ','
  3037 -> ' inform'
  1278 -> ' the'
  3330 -> ' user'
  1455 -> ' that'
  1636 -> ' you'
  6560 -> ' cannot'
  3142 -> ' perform'
  1278 -> ' the'
 24130 -> ' requested'
  5263 -> ' action'
  1513 -> ' at'
  1278 -> ' the'
  4735 -> ' moment'
  1046 -> '.'
    18 -> '[/SYSTEM_PROMPT]'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

CISC · 2025-08-15T20:01:03Z

Something is wrong with Mistral Small yet:

Yeah, the tekken template fix-hack backfires when used with --jinja, someone should take a look at that...

danbev · 2025-08-16T06:36:10Z

Yeah, the tekken template fix-hack backfires when used with --jinja, someone should take a look at that...

I'll take a closer look at this next week 👍

danbev · 2025-08-18T13:21:14Z

The results (see the last tokens below):

@broadbit-hu Would you be able to give a link to the model to reproduce the tekken template issue?

broadbit-hu · 2025-08-18T16:28:18Z

@danbev This is a locally-quantized model (using the recent convert script and llama-quantize), I've no link yet.

The missing files (like "tokenizer.json") copied from: https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501/tree/main.

The current gguf-my-repo fails with conversion:
https://huggingface.co/spaces/ggml-org/gguf-my-repo

INFO:gguf.vocab:Loading Mistral tokenizer from downloads/tmpmhkga0qx/Mistral-Small-3.2-24B-Instruct-2506
INFO:mistral_common.tokens.tokenizers.tekken:Vocab size: 150000
INFO:mistral_common.tokens.tokenizers.tekken:Cutting vocab to first 130072 tokens.
INFO:hf-to-gguf:Converting tokenizer MistralTokenizerType.tekken of size 131072.
INFO:hf-to-gguf:Setting bos, eos, unk and pad token IDs to 1, 2, 0, 11.
WARNING:gguf.gguf_writer:Duplicated key name 'llama.vocab_size', overwriting it with new value 131072 of type UINT32
Traceback (most recent call last):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2027, in set_vocab
    self._set_vocab_sentencepiece()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 974, in _set_vocab_sentencepiece
    tokens, scores, toktypes = self._create_vocab_sentencepiece()
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 991, in _create_vocab_sentencepiece
    raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: downloads/tmpmhkga0qx/Mistral-Small-3.2-24B-Instruct-2506/tokenizer.model

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2030, in set_vocab
    self._set_vocab_llama_hf()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 1076, in _set_vocab_llama_hf
    vocab = gguf.LlamaHfVocab(self.dir_model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/llama.cpp/gguf-py/gguf/vocab.py", line 505, in __init__
    with open(fname_tokenizer, encoding='utf-8') as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'downloads/tmpmhkga0qx/Mistral-Small-3.2-24B-Instruct-2506/tokenizer.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8788, in <module>
    main()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8782, in main
    model_instance.write()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 426, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 547, in prepare_metadata
    self.set_vocab()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2033, in set_vocab
    self._set_vocab_gpt2()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 910, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 633, in get_vocab_base
    tokenizer = AutoTokenizer.from_pretrained(self.dir_model)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 1132, in from_pretrained
    tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
                                               ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 811, in __getitem__
    raise KeyError(key)
KeyError: <class 'transformers.models.mistral3.configuration_mistral3.Mistral3Config'>

broadbit-hu · 2025-08-18T17:10:41Z

So, I've quantized the unsloth version of this model using gguf-my-repo:

There's no problem with the prompt tokens:

common_init_from_params: added </s> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 1024
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 6
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
main: chat template example:
[SYSTEM_PROMPT]You are a helpful assistant[/SYSTEM_PROMPT][INST]Hello[/INST]Hi there</s>[INST]How are you?[/INST]

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

main: prompt: 'Test'
main: number of tokens in prompt = 558
     1 -> '<s>'
    17 -> '[SYSTEM_PROMPT]'
  4568 -> 'You'
  1584 -> ' are'
 42301 -> ' Mist'
  2784 -> 'ral'
  5624 -> '-S'
 53721 -> 'mall'
  1045 -> '-'
  1051 -> '3'
  1046 -> '.'
  1050 -> '2'
  1045 -> '-'
  1050 -> '2'
  1052 -> '4'
  1066 -> 'B'
 47926 -> '-In'
  8166 -> 'struct'
  1045 -> '-'
  1050 -> '2'
  1053 -> '5'
  1048 -> '0'
  1054 -> '6'
  1044 -> ','
  1261 -> ' a'
 43520 -> ' Large'
 26242 -> ' Language'
 11512 -> ' Model'
  1319 -> ' ('
 23947 -> 'LL'
  1077 -> 'M'
  1041 -> ')'
  6254 -> ' created'
  1536 -> ' by'
 42301 -> ' Mist'
  2784 -> 'ral'
 26554 -> ' AI'
  1044 -> ','
  1261 -> ' a'
  8689 -> ' French'
 53862 -> ' startup'
  3518 -> ' head'
125609 -> 'quartered'
  1294 -> ' in'
  6993 -> ' Paris'
  1626 -> '.
'
  4568 -> 'You'
  4053 -> ' power'
  1420 -> ' an'
 26554 -> ' AI'
 27089 -> ' assistant'
  4418 -> ' called'
  2301 -> ' Le'
 38680 -> ' Chat'
  1626 -> '.
'
 16994 -> 'Your'
  7807 -> ' knowledge'
  4469 -> ' base'
  1486 -> ' was'
  3804 -> ' last'
 12220 -> ' updated'
  1408 -> ' on'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1051 -> '3'
  1045 -> '-'
  1049 -> '1'
  1048 -> '0'
  1045 -> '-'
  1048 -> '0'
  1049 -> '1'
  1626 -> '.
'
  1784 -> 'The'
  3519 -> ' current'
  5451 -> ' date'
  1395 -> ' is'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1053 -> '5'
  1045 -> '-'
  1048 -> '0'
  1056 -> '8'
  1045 -> '-'
  1049 -> '1'
  1056 -> '8'
  1338 -> '.

'
  7651 -> 'When'
  1636 -> ' you'
  6185 -> ''re'
  1605 -> ' not'
  5257 -> ' sure'
  2314 -> ' about'
  2269 -> ' some'
  3686 -> ' information'
  1505 -> ' or'
  2200 -> ' when'
  1278 -> ' the'
  3330 -> ' user'
  1681 -> ''s'
  4546 -> ' request'
 10867 -> ' requires'
  2015 -> ' up'
  6793 -> '-to'
 43546 -> '-date'
  1505 -> ' or'
  4811 -> ' specific'
  2181 -> ' data'
  1044 -> ','
  1636 -> ' you'
  4016 -> ' must'
  2210 -> ' use'
  1278 -> ' the'
  5178 -> ' available'
 12589 -> ' tools'
  1317 -> ' to'
 15273 -> ' fetch'
  1278 -> ' the'
  3686 -> ' information'
  1046 -> '.'
  5469 -> ' Do'
  1605 -> ' not'
 89786 -> ' hesitate'
  1317 -> ' to'
  2210 -> ' use'
 12589 -> ' tools'
 26119 -> ' whenever'
  2127 -> ' they'
  1710 -> ' can'
  5234 -> ' provide'
  1261 -> ' a'
  2081 -> ' more'
 18501 -> ' accurate'
  1505 -> ' or'
  7662 -> ' complete'
  4005 -> ' response'
  1046 -> '.'
  3367 -> ' If'
  1836 -> ' no'
 11157 -> ' relevant'
 12589 -> ' tools'
  1584 -> ' are'
  5178 -> ' available'
  1044 -> ','
  2430 -> ' then'
 11904 -> ' clearly'
  3468 -> ' state'
  1455 -> ' that'
  1636 -> ' you'
  2607 -> ' don'
  2405 -> ''t'
  1736 -> ' have'
  1278 -> ' the'
  3686 -> ' information'
  1321 -> ' and'
 10035 -> ' avoid'
  6187 -> ' making'
  2015 -> ' up'
  7211 -> ' anything'
  1626 -> '.
'
  5475 -> 'If'
  1278 -> ' the'
  3330 -> ' user'
  1681 -> ''s'
  4098 -> ' question'
  1395 -> ' is'
  1605 -> ' not'
  6133 -> ' clear'
  1044 -> ','
 61103 -> ' ambiguous'
  1044 -> ','
  1505 -> ' or'
  3120 -> ' does'
  1605 -> ' not'
  5234 -> ' provide'
  6171 -> ' enough'
  5315 -> ' context'
  1394 -> ' for'
  1636 -> ' you'
  1317 -> ' to'
 32181 -> ' accurately'
  4832 -> ' answer'
  1278 -> ' the'
  4098 -> ' question'
  1044 -> ','
  1636 -> ' you'
  1653 -> ' do'
  1605 -> ' not'
  3352 -> ' try'
  1317 -> ' to'
  4832 -> ' answer'
  1494 -> ' it'
  3169 -> ' right'
  5109 -> ' away'
  1321 -> ' and'
  1636 -> ' you'
  6153 -> ' rather'
  4237 -> ' ask'
  1278 -> ' the'
  3330 -> ' user'
  1317 -> ' to'
 38695 -> ' clarify'
  2034 -> ' their'
  4546 -> ' request'
  1319 -> ' ('
  1101 -> 'e'
  3596 -> '.g'
  1046 -> '.'
  1429 -> ' "'
  7493 -> 'What'
  1584 -> ' are'
  2269 -> ' some'
  3683 -> ' good'
 40378 -> ' restaurants'
  3879 -> ' around'
  1639 -> ' me'
 10555 -> '?"'
  2297 -> ' =>'
  1429 -> ' "'
 17507 -> 'Where'
  1584 -> ' are'
  1636 -> ' you'
 10555 -> '?"'
  1505 -> ' or'
  1429 -> ' "'
  7651 -> 'When'
  1395 -> ' is'
  1278 -> ' the'
  4275 -> ' next'
 18034 -> ' flight'
  1317 -> ' to'
 23286 -> ' Tokyo'
  1034 -> '"'
  2297 -> ' =>'
  1429 -> ' "'
 17507 -> 'Where'
  1653 -> ' do'
  1636 -> ' you'
 10601 -> ' travel'
  1562 -> ' from'
 10555 -> '?"'
  4342 -> ').
'
  4568 -> 'You'
  1584 -> ' are'
  5282 -> ' always'
  3435 -> ' very'
 41132 -> ' attent'
  1556 -> 'ive'
  1317 -> ' to'
 18814 -> ' dates'
  1044 -> ','
  1294 -> ' in'
  4369 -> ' particular'
  1636 -> ' you'
  3352 -> ' try'
  1317 -> ' to'
 18507 -> ' resolve'
 18814 -> ' dates'
  1319 -> ' ('
  1101 -> 'e'
  3596 -> '.g'
  1046 -> '.'
  1429 -> ' "'
  1121 -> 'y'
 32430 -> 'esterday'
  1034 -> '"'
  1395 -> ' is'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1053 -> '5'
  1045 -> '-'
  1048 -> '0'
  1056 -> '8'
  1045 -> '-'
  1049 -> '1'
  1055 -> '7'
  1041 -> ')'
  1321 -> ' and'
  2200 -> ' when'
  6136 -> ' asked'
  2314 -> ' about'
  3686 -> ' information'
  1513 -> ' at'
  4811 -> ' specific'
 18814 -> ' dates'
  1044 -> ','
  1636 -> ' you'
 89782 -> ' discard'
  3686 -> ' information'
  1455 -> ' that'
  1395 -> ' is'
  1513 -> ' at'
  3866 -> ' another'
  5451 -> ' date'
  1626 -> '.
'
  4568 -> 'You'
  2685 -> ' follow'
  2576 -> ' these'
 15776 -> ' instructions'
  1294 -> ' in'
  1747 -> ' all'
 18085 -> ' languages'
  1044 -> ','
  1321 -> ' and'
  5282 -> ' always'
  9148 -> ' respond'
  1317 -> ' to'
  1278 -> ' the'
  3330 -> ' user'
  1294 -> ' in'
  1278 -> ' the'
  7278 -> ' language'
  2127 -> ' they'
  2210 -> ' use'
  1505 -> ' or'
  4546 -> ' request'
  1626 -> '.
'
 12961 -> 'Next'
 14275 -> ' sections'
 12293 -> ' describe'
  1278 -> ' the'
 28946 -> ' capabilities'
  1455 -> ' that'
  1636 -> ' you'
  1736 -> ' have'
  1338 -> '.

'
  1035 -> '#'
  1488 -> ' W'
 34112 -> 'EB'
  1398 -> ' B'
  4755 -> 'RO'
 20266 -> 'WS'
  9774 -> 'ING'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  6560 -> ' cannot'
  3142 -> ' perform'
  2258 -> ' any'
  7430 -> ' web'
  6123 -> ' search'
  1505 -> ' or'
  4731 -> ' access'
 18259 -> ' internet'
  1317 -> ' to'
  3432 -> ' open'
 76064 -> ' URLs'
  1044 -> ','
 14440 -> ' links'
  6704 -> ' etc'
  1046 -> '.'
  3367 -> ' If'
  1494 -> ' it'
  7444 -> ' seems'
  2479 -> ' like'
  1278 -> ' the'
  3330 -> ' user'
  1395 -> ' is'
 39322 -> ' expecting'
  1636 -> ' you'
  1317 -> ' to'
  1653 -> ' do'
  1878 -> ' so'
  1044 -> ','
  1636 -> ' you'
 38695 -> ' clarify'
  1278 -> ' the'
  8516 -> ' situation'
  1321 -> ' and'
  4237 -> ' ask'
  1278 -> ' the'
  3330 -> ' user'
  1317 -> ' to'
  9441 -> ' copy'
 31944 -> ' paste'
  1278 -> ' the'
  3403 -> ' text'
  7655 -> ' directly'
  1294 -> ' in'
  1278 -> ' the'
 21666 -> ' chat'
  1338 -> '.

'
  1035 -> '#'
  1373 -> ' M'
 15373 -> 'ULT'
  1073 -> 'I'
  5036 -> '-M'
  7460 -> 'OD'
  4286 -> 'AL'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  1736 -> ' have'
  1278 -> ' the'
  8727 -> ' ability'
  1317 -> ' to'
  3346 -> ' read'
  8061 -> ' images'
  1044 -> ','
  1809 -> ' but'
  1636 -> ' you'
  6560 -> ' cannot'
 10616 -> ' generate'
  8061 -> ' images'
  1046 -> '.'
  3213 -> ' You'
  2095 -> ' also'
  6560 -> ' cannot'
  2148 -> ' trans'
 13089 -> 'cribe'
 16023 -> ' audio'
  7309 -> ' files'
  1505 -> ' or'
 26612 -> ' videos'
  1626 -> '.
'
  4568 -> 'You'
  6560 -> ' cannot'
  3346 -> ' read'
  6685 -> ' nor'
  2148 -> ' trans'
 13089 -> 'cribe'
 16023 -> ' audio'
  7309 -> ' files'
  1505 -> ' or'
 26612 -> ' videos'
  1338 -> '.

'
  1035 -> '#'
 18580 -> ' TO'
  8568 -> 'OL'
 58135 -> ' CALL'
  9774 -> 'ING'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  2188 -> ' may'
  1736 -> ' have'
  4731 -> ' access'
  1317 -> ' to'
 12589 -> ' tools'
  1455 -> ' that'
  1636 -> ' you'
  1710 -> ' can'
  2210 -> ' use'
  1317 -> ' to'
 15273 -> ' fetch'
  3686 -> ' information'
  1505 -> ' or'
  3142 -> ' perform'
 10636 -> ' actions'
  1046 -> '.'
  3213 -> ' You'
  4016 -> ' must'
  2210 -> ' use'
  2576 -> ' these'
 12589 -> ' tools'
  1294 -> ' in'
  1278 -> ' the'
  3629 -> ' following'
 19599 -> ' situations'
  2100 -> ':

'
  1049 -> '1'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 10867 -> ' requires'
  2015 -> ' up'
  6793 -> '-to'
 43546 -> '-date'
  3686 -> ' information'
  1626 -> '.
'
  1050 -> '2'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 10867 -> ' requires'
  4811 -> ' specific'
  2181 -> ' data'
  1455 -> ' that'
  1636 -> ' you'
  1653 -> ' do'
  1605 -> ' not'
  1736 -> ' have'
  1294 -> ' in'
  2143 -> ' your'
  7807 -> ' knowledge'
  4469 -> ' base'
  1626 -> '.
'
  1051 -> '3'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 19263 -> ' involves'
 10636 -> ' actions'
  1455 -> ' that'
  1636 -> ' you'
  6560 -> ' cannot'
  3142 -> ' perform'
  3816 -> ' without'
 12589 -> ' tools'
  1338 -> '.

'
 82158 -> 'Always'
 54628 -> ' priorit'
  2033 -> 'ize'
  2505 -> ' using'
 12589 -> ' tools'
  1317 -> ' to'
  5234 -> ' provide'
  1278 -> ' the'
  2725 -> ' most'
 18501 -> ' accurate'
  1321 -> ' and'
 20351 -> ' helpful'
  4005 -> ' response'
  1046 -> '.'
  3367 -> ' If'
 12589 -> ' tools'
  1584 -> ' are'
  1605 -> ' not'
  5178 -> ' available'
  1044 -> ','
  3037 -> ' inform'
  1278 -> ' the'
  3330 -> ' user'
  1455 -> ' that'
  1636 -> ' you'
  6560 -> ' cannot'
  3142 -> ' perform'
  1278 -> ' the'
 24130 -> ' requested'
  5263 -> ' action'
  1513 -> ' at'
  1278 -> ' the'
  4735 -> ' moment'
  1046 -> '.'
    18 -> '[/SYSTEM_PROMPT]'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

I'll check my local quantization process...

broadbit-hu · 2025-08-18T18:22:58Z

@danbev Sorry, I was mistaken. After the tokenizer.json and tokenizer_config.json files updated from unsloth repo, there are no issues with the prompt tokenization.

danbev · 2025-08-19T04:08:44Z

@broadbit-hu Great, glad to hear that! I also tried out the unsloth model yesterday and it worked, which was the reason for asking about the actual model you were using.

github-actions bot added the python python script changes label Aug 14, 2025

danbev requested a review from ngxson as a code owner August 15, 2025 09:01

danbev changed the title ~~convert : add bos token for Gemma 3 base models~~ llama : pass add_bos and add_eos to common_chat_templates_apply Aug 15, 2025

github-actions bot added examples server labels Aug 15, 2025

CISC requested review from CISC and removed request for ngxson August 15, 2025 09:21

danbev force-pushed the gemma-3-convert-add_bos branch from fcc2931 to b4d28e9 Compare August 15, 2025 10:25

danbev changed the title ~~llama : pass add_bos and add_eos to common_chat_templates_apply~~ common : use common_chat_templates for add_bos and add_eos Aug 15, 2025

danbev requested a review from ggerganov August 15, 2025 10:34

CISC approved these changes Aug 15, 2025

View reviewed changes

danbev removed the python python script changes label Aug 15, 2025

CISC merged commit 5e6229a into ggml-org:master Aug 15, 2025
46 of 47 checks passed

danbev deleted the gemma-3-convert-add_bos branch August 21, 2025 10:18

common : use common_chat_templates for add_bos and add_eos #15326

common : use common_chat_templates for add_bos and add_eos #15326

Uh oh!

Conversation

danbev commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Aug 14, 2025

Uh oh!

ggerganov commented Aug 15, 2025

Uh oh!

danbev commented Aug 15, 2025

Uh oh!

ggerganov commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Aug 15, 2025

Uh oh!

danbev commented Aug 15, 2025

Uh oh!

CISC commented Aug 15, 2025

Uh oh!

CISC commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Aug 15, 2025

Uh oh!

CISC commented Aug 15, 2025

Uh oh!

CISC commented Aug 15, 2025

Uh oh!

CISC commented Aug 15, 2025

Uh oh!

CISC commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Aug 15, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

danbev commented Aug 15, 2025

Uh oh!

CISC commented Aug 15, 2025

Uh oh!

Uh oh!

broadbit-hu commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Aug 15, 2025

Uh oh!

broadbit-hu commented Aug 15, 2025

Uh oh!

broadbit-hu commented Aug 15, 2025

Uh oh!

CISC commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Aug 16, 2025

Uh oh!

danbev commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

broadbit-hu commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

broadbit-hu commented Aug 18, 2025

Uh oh!

broadbit-hu commented Aug 18, 2025

Uh oh!

danbev commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

danbev commented Aug 14, 2025 •

edited

Loading

ggerganov commented Aug 15, 2025 •

edited

Loading

CISC commented Aug 15, 2025 •

edited

Loading

CISC commented Aug 15, 2025 •

edited

Loading

CISC commented Aug 15, 2025 •

edited

Loading

broadbit-hu commented Aug 15, 2025 •

edited

Loading

CISC commented Aug 15, 2025 •

edited

Loading

danbev commented Aug 18, 2025 •

edited

Loading

broadbit-hu commented Aug 18, 2025 •

edited

Loading