Skip to content

Conversation

danbev
Copy link
Member

@danbev danbev commented Aug 14, 2025

This commit updates common_chat_templates_apply_jinja to use the
the add_bos and add_eos parameters from the chat template instead of
the inputs.

The motivation for this is that currently if the add_bos and add_eos
from the input parameters are used it is possible to there will be a
missmatch between the model and the chat template which can lead to the
the removal of duplicate BOS/EOS tokens in chat.cpp apply to not
happen leading to two BOS tokens being added to the template.


I've tried this using new converted models and the bos duplication is not there. If this solution is accepted then I'll re-convert the instruction tuned models and upload them to ggml-org.

@CISC
Copy link
Collaborator

CISC commented Aug 14, 2025

You're saying double BOS is being added to the instruction tuned model, but only without jinja?

I can't verify the model config as it's gated, though looking at f.ex. MLX versions it seems BOS is <bos>

So that doesn't make much sense, the gemma chat template doesn't have BOS, and SPM has add_bos set by default, meaning there should be only one BOS being added. For jinja there's a BOS in the chat template, but as long as add_bos is true this should be automatically removed.

@github-actions github-actions bot added the python python script changes label Aug 14, 2025
@ggerganov
Copy link
Member

Here is my understanding:

  • Without this change, if you run IT model without --jinja it uses single BOS (correct). But if you add --jinja it will have 2 BOS (wrong)
  • With this change, if you run the IT model without --jinja it uses zero BOS (wrong) and with --jinja it uses 1 BOS (correct)

Maybe I don't understand the logic completely, but this seems very confusing. I can't tell when --jinja should be used and when it should not. Can we improve this somehow?

@danbev
Copy link
Member Author

danbev commented Aug 15, 2025

Sorry about the confusion, it was late yesterday and I was a little rushed creating this PR. I've not looked at this part of the code base much, but I'll take a closer look today and try to understand this issue better.

@ggerganov
Copy link
Member

ggerganov commented Aug 15, 2025

For jinja there's a BOS in the chat template, but as long as add_bos is true this should be automatically removed.

@CISC Maybe this is the root of the problem - I'm pretty sure that when I tested yesterday with --jinja and without the patch from this PR, the second BOS was not removed. Will double check now to confirm.

@ggerganov
Copy link
Member

Here is a repro using master:

$ huggingface-cli download google/gemma-3-270m-it --local-dir google/gemma-3-270m-it

$ python3 convert_hf_to_gguf.py google/gemma-3-270m-it/ --outfile ./models/gemma-3-270m-it/ggml-model-bf16.gguf --outtype bf16

$ ./bin/llama-cli -m ../models/gemma-3-270m-it/ggml-model-bf16.gguf -c 0 -fa --jinja -p "Test" --verbose-prompt

...

0.00.118.683 I llama_model_loader: - kv  33:            tokenizer.ggml.padding_token_id u32              = 0
0.00.118.684 I llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
0.00.118.684 I llama_model_loader: - kv  35:               tokenizer.ggml.add_sep_token bool             = false
0.00.118.686 I llama_model_loader: - kv  36:               tokenizer.ggml.add_eos_token bool             = false

...

0.00.354.187 I 
0.00.354.385 W tokenize: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
0.00.354.389 I main: prompt: 'Test'
0.00.354.389 I main: number of tokens in prompt = 11
0.00.354.390 I      2 -> '<bos>'
0.00.354.390 I      2 -> '<bos>'
0.00.354.390 I    105 -> '<start_of_turn>'
0.00.354.392 I   2364 -> 'user'
0.00.354.392 I    107 -> '
'
0.00.354.394 I   3694 -> 'Test'
0.00.354.394 I    106 -> '<end_of_turn>'
0.00.354.394 I    107 -> '
'
0.00.354.394 I    105 -> '<start_of_turn>'
0.00.354.395 I   4368 -> 'model'
0.00.354.395 I    107 -> '
'
0.00.354.395 I 
0.00.354.397 I main: interactive mode on.
0.00.354.410 I sampler seed: 3041241033

...

In this case add_bos == true and the Jinja template has BOS, which results in 2 BOSes with the --jinja flag.

The llama-server ... --jinja behaves the same way:

0.11.182.122 D ubatch_print:   token     = [
0.11.182.123 D ubatch_print:     0: id =      2 (           <bos>), pos =    0, n_seq_id =  1, seq_id = [0], output = 0
0.11.182.124 D ubatch_print:     1: id =      2 (           <bos>), pos =    1, n_seq_id =  1, seq_id = [0], output = 0
0.11.182.126 D ubatch_print:     2: id =    105 ( <start_of_turn>), pos =    2, n_seq_id =  1, seq_id = [0], output = 0
0.11.182.127 D ubatch_print:     3: id =   2364 (            user), pos =    3, n_seq_id =  1, seq_id = [0], output = 0

@danbev
Copy link
Member Author

danbev commented Aug 15, 2025

I noticed that the instruction tuned model has the following:

(venv) $ head ~/work/ai/models/gemma-3-270m-it/tokenizer_config.json
{
  "add_bos_token": true,
  "add_eos_token": false,
  "added_tokens_decoder": {
   ...

The pretrained/base model also add_bos_token set to true which I think is correct, but I don't think this should be true for the instruction tuned model?

@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

...
0.00.118.684 I llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
...
0.00.354.390 I      2 -> '<bos>'
0.00.354.390 I      2 -> '<bos>'
0.00.354.390 I    105 -> '<start_of_turn>'
...

Ok, that should not happen, it should have been removed here:

llama.cpp/common/chat.cpp

Lines 791 to 800 in f75b830

// To avoid double BOS / EOS tokens, we're manually removing begining / trailing tokens
// instead of using `chat_template_options.use_bos_token = false`, since these tokens
// may be needed inside the template / between messages too.
auto result = tmpl.apply(tmpl_inputs, tmpl_opts);
if (inputs.add_bos && string_starts_with(result, tmpl.bos_token())) {
result = result.substr(tmpl.bos_token().size());
}
if (inputs.add_eos && string_ends_with(result, tmpl.eos_token())) {
result = result.substr(0, result.size() - tmpl.eos_token().size());
}

@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

The pretrained/base model also add_bos_token set to true which I think is correct, but I don't think this should be true for the instruction tuned model?

It should, the problem is just that for some reason it's not automatically removed from the chat template (which technically is the wrong approach, we really should disable add_bos/add_eos when using jinja chat templates instead).

@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

Ah, wait, is perhaps the problem that the token is tokenized with a prepended space?

llama.cpp/common/chat.cpp

Lines 574 to 585 in f75b830

const auto get_token = [&](llama_token token, const char * name, const char * jinja_variable_name) {
if (token == LLAMA_TOKEN_NULL) {
if (default_template_src.find(jinja_variable_name) != std::string::npos
|| template_tool_use_src.find(jinja_variable_name) != std::string::npos) {
LOG_WRN("common_chat_templates_init: warning: vocab does not have a %s token, jinja template won't work as intended.\n", name);
}
return std::string();
}
return common_token_to_piece(vocab, token, true);
};
token_bos = get_token(llama_vocab_bos(vocab), "BOS", "bos_token");
token_eos = get_token(llama_vocab_eos(vocab), "EOS", "eos_token");

Edit: Nope, add_space_prefix is false.

@danbev
Copy link
Member Author

danbev commented Aug 15, 2025

Ok, that should not happen, it should have been removed here:

This does not seem to happen for all templates, for example in

const bool add_bos = llama_vocab_get_add_bos(vocab) && !params.use_jinja;

This will be false (assuming that we are not using the workaround in this PR, I reverted it locally). And then later we have:

if (!params.system_prompt.empty() || !params.prompt.empty()) {
common_chat_templates_inputs inputs;
inputs.use_jinja = g_params->use_jinja;
inputs.messages = chat_msgs;
inputs.add_generation_prompt = !params.prompt.empty();
prompt = common_chat_templates_apply(chat_templates.get(), inputs).prompt;

But this is not setting in the add_bos on inputs so it will be false. Perhaps this should be:

diff --git a/tools/main/main.cpp b/tools/main/main.cpp
index dc776f59e..04379201e 100644
--- a/tools/main/main.cpp
+++ b/tools/main/main.cpp
@@ -255,7 +255,7 @@ int main(int argc, char ** argv) {
         }
     }
 
-    const bool add_bos = llama_vocab_get_add_bos(vocab) && !params.use_jinja;
+    const bool add_bos = llama_vocab_get_add_bos(vocab);
     if (!llama_model_has_encoder(model)) {
         GGML_ASSERT(!llama_vocab_get_add_eos(vocab));
     }
@@ -294,6 +294,7 @@ int main(int argc, char ** argv) {
                 common_chat_templates_inputs inputs;
                 inputs.use_jinja = g_params->use_jinja;
                 inputs.messages = chat_msgs;
+                inputs.add_bos = add_bos;
                 inputs.add_generation_prompt = !params.prompt.empty();
 
                 prompt = common_chat_templates_apply(chat_templates.get(), inputs).prompt;

@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

@danbev Yep, you are right, I overlooked this codepath.

@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

@danbev Mind adding a new PR after testing? Don't forget to pass add_eos too.

@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

It might need fixing elsewhere too: https://github.com/search?q=repo%3Aggml-org%2Fllama.cpp%20common_chat_templates_apply&type=code

@danbev danbev requested a review from ngxson as a code owner August 15, 2025 09:01
@danbev danbev changed the title convert : add bos token for Gemma 3 base models llama : pass add_bos and add_eos to common_chat_templates_apply Aug 15, 2025
@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

@danbev Actually, looking at this more closely I think I made a mistake in #15086 common_chat_templates_apply_jinja shouldn't get those params from inputs, but from tmpls.

Edit: Sorry for the back-and-forth, but I think this is the only change that needs to be done, the cause of the problem is here:

llama.cpp/common/chat.cpp

Lines 2064 to 2065 in f75b830

params.add_bos = inputs.add_bos;
params.add_eos = inputs.add_eos;

@CISC CISC requested review from CISC and removed request for ngxson August 15, 2025 09:21
@danbev
Copy link
Member Author

danbev commented Aug 15, 2025

Edit: Sorry for the back-and-forth, but I think this is the only change that needs to be done, the cause of the problem is here:

No worries at all! Sounds good, I'll try that, thanks!

This commit updates common_chat_templates_apply_jinja to use the
the add_bos and add_eos parameters from the chat template instead of
the inputs.

The motivation for this is that currently if the `add_bos` and `add_eos`
from the input parameters are used it is possible to there will be a
missmatch between the model and the chat template which can lead to the
the removal of duplicate BOS/EOS tokens in chat.cpp `apply` to not
happen leading to two BOS tokens being added to the template.
@danbev danbev force-pushed the gemma-3-convert-add_bos branch from fcc2931 to b4d28e9 Compare August 15, 2025 10:25
@danbev danbev changed the title llama : pass add_bos and add_eos to common_chat_templates_apply common : use common_chat_templates for add_bos and add_eos Aug 15, 2025
@danbev danbev requested a review from ggerganov August 15, 2025 10:34
Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, everything works now with/without jinja?

@danbev danbev removed the python python script changes label Aug 15, 2025
@danbev
Copy link
Member Author

danbev commented Aug 15, 2025

Thanks, everything works now with/without jinja?

Yes, I think this looks good now:

llama-cli and llama-server outputs

llama-cli with --jinja:

(venv) $ build/bin/llama-cli -m models/gemma-3-270m-it.gguf -c 0 -fa --jinja -p "Test" --verbose-prompt
...
main: prompt: 'Test'
main: number of tokens in prompt = 10
     2 -> '<bos>'
   105 -> '<start_of_turn>'
  2364 -> 'user'
   107 -> '
'
  3694 -> 'Test'
   106 -> '<end_of_turn>'
   107 -> '
'
   105 -> '<start_of_turn>'
  4368 -> 'model'
   107 -> '

And llama-cli without --jinja:

(venv) $ build/bin/llama-cli -m models/gemma-3-270m-it.gguf -c 0 -fa -p "Test" --verbose-prompt
...
main: prompt: 'Test'
main: number of tokens in prompt = 10
     2 -> '<bos>'
   105 -> '<start_of_turn>'
  2364 -> 'user'
   107 -> '
'
  3694 -> 'Test'
   106 -> '<end_of_turn>'
   107 -> '
'
   105 -> '<start_of_turn>'
  4368 -> 'model'
   107 -> '
'

And llama-server with --jinja:

(venv) $ build/bin/llama-server -m models/gemma-3-270m-it.gguf -c 0 -fa --verbose-prompt -t 1 --threads-http 1
...
main: chat template, chat_template: {{ bos_token }}
{%- if messages[0]['role'] == 'system' -%}
    {%- if messages[0]['content'] is string -%}
        {%- set first_user_prefix = messages[0]['content'] + '

' -%}
    {%- else -%}
        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '

' -%}
    {%- endif -%}
    {%- set loop_messages = messages[1:] -%}
{%- else -%}
    {%- set first_user_prefix = "" -%}
    {%- set loop_messages = messages -%}
{%- endif -%}
{%- for message in loop_messages -%}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
    {%- endif -%}
    {%- if (message['role'] == 'assistant') -%}
        {%- set role = "model" -%}
    {%- else -%}
        {%- set role = message['role'] -%}
    {%- endif -%}
    {{ '<start_of_turn>' + role + '
' + (first_user_prefix if loop.first else "") }}
    {%- if message['content'] is string -%}
        {{ message['content'] | trim }}
    {%- elif message['content'] is iterable -%}
        {%- for item in message['content'] -%}
            {%- if item['type'] == 'image' -%}
                {{ '<start_of_image>' }}
            {%- elif item['type'] == 'text' -%}
                {{ item['text'] | trim }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{ raise_exception("Invalid content type") }}
    {%- endif -%}
    {{ '<end_of_turn>
' }}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{'<start_of_turn>model
'}}
{%- endif -%}
, example_format: '<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
How are you?<end_of_turn>
<start_of_turn>model
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Content-only
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 23
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 23, n_tokens = 23, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 23, n_tokens = 23
slot update_slots: id  0 | task 0 | SWA checkpoint create, pos_min = 0, pos_max = 22, size = 0.338 MiB, total = 1/3 (0.338 MiB)
slot      release: id  0 | task 0 | stop processing: n_past = 32, truncated = 0
slot print_timing: id  0 | task 0 |
prompt eval time =      97.37 ms /    23 tokens (    4.23 ms per token,   236.21 tokens per second)
       eval time =     275.07 ms /    10 tokens (   27.51 ms per token,    36.35 tokens per second)
      total time =     372.44 ms /    33 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200

And llama-server without --jinja:

(venv) $ build/bin/llama-server -m models/gemma-3-270m-it.gguf -c 0 -fa --verbose-prompt -t 1 --threads-http 1
...
main: chat template, chat_template: {{ bos_token }}
{%- if messages[0]['role'] == 'system' -%}
    {%- if messages[0]['content'] is string -%}
        {%- set first_user_prefix = messages[0]['content'] + '

' -%}
    {%- else -%}
        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '

' -%}
    {%- endif -%}
    {%- set loop_messages = messages[1:] -%}
{%- else -%}
    {%- set first_user_prefix = "" -%}
    {%- set loop_messages = messages -%}
{%- endif -%}
{%- for message in loop_messages -%}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
    {%- endif -%}
    {%- if (message['role'] == 'assistant') -%}
        {%- set role = "model" -%}
    {%- else -%}
        {%- set role = message['role'] -%}
    {%- endif -%}
    {{ '<start_of_turn>' + role + '
' + (first_user_prefix if loop.first else "") }}
    {%- if message['content'] is string -%}
        {{ message['content'] | trim }}
    {%- elif message['content'] is iterable -%}
        {%- for item in message['content'] -%}
            {%- if item['type'] == 'image' -%}
                {{ '<start_of_image>' }}
            {%- elif item['type'] == 'text' -%}
                {{ item['text'] | trim }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{ raise_exception("Invalid content type") }}
    {%- endif -%}
    {{ '<end_of_turn>
' }}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{'<start_of_turn>model
'}}
{%- endif -%}
, example_format: '<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
How are you?<end_of_turn>
<start_of_turn>model
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle

srv  log_server_r: request: GET / 127.0.0.1 200
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  params_from_: Chat format: Content-only
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 23
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 23, n_tokens = 23, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 23, n_tokens = 23
slot update_slots: id  0 | task 0 | SWA checkpoint create, pos_min = 0, pos_max = 22, size = 0.338 MiB, total = 1/3 (0.338 MiB)
slot      release: id  0 | task 0 | stop processing: n_past = 32, truncated = 0
slot print_timing: id  0 | task 0 |
prompt eval time =     108.48 ms /    23 tokens (    4.72 ms per token,   212.03 tokens per second)
       eval time =     326.19 ms /    10 tokens (   32.62 ms per token,    30.66 tokens per second)
      total time =     434.66 ms /    33 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 200

Let me know if there is anything else I should test to verify this.

@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

Thanks, everything works now with/without jinja?

Yes, I think this looks good now:

Perfect, thanks again! :)

@CISC CISC merged commit 5e6229a into ggml-org:master Aug 15, 2025
46 of 47 checks passed
@broadbit-hu
Copy link

broadbit-hu commented Aug 15, 2025

Already tested with b6152 release and Mistral NeMo Instruct:

./build/bin/llama-cli -m ../models/mistral-nemo-instruct-2407-q8_0.gguf -c 0 -fa --jinja -p "Test" --verbose-prompt

Results:

check_double_bos_eos: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
main: prompt: 'Test'
main: number of tokens in prompt = 5
     1 -> '<s>'
     1 -> '<s>'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

After the patch:

main: prompt: 'Test'
main: number of tokens in prompt = 4
     1 -> '<s>'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

@danbev
Copy link
Member Author

danbev commented Aug 15, 2025

@broadbit-hu Could you give this a try with b6178? This is the release that contains this code of this PR.

@broadbit-hu
Copy link

@broadbit-hu Could you give this a try with b6178? This is the release that contains this code of this PR.

It's perfect (tested with Mistral NeMo), thanks for the fix! :)

main: number of tokens in prompt = 4
     1 -> '<s>'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

Something is wrong with Mistral Small yet:

./build/bin/llama-cli -m ../models/mistral-small-24b-instruct-2506-q4_k_m.gguf -c 0 -fa --jinja -p "Test" --verbose-prompt
print_info: model params     = 23.57 B
print_info: general.name     = Mistral Small 3.2 24B Instruct 2506
print_info: vocab type       = BPE
print_info: n_vocab          = 131072
print_info: n_merges         = 269443
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: PAD token        = 11 '<pad>'
print_info: LF token         = 1010 'Ċ'
print_info: EOG token        = 2 '</s>'

...

Failed to infer a tool call example (possible template bug)
main: llama threadpool init, n_threads = 6
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
main: chat template example:
mistral-v7-tekken

main: prompt: 'Test'
main: number of tokens in prompt = 8
     1 -> '<s>'
 39575 -> 'mist'
  2784 -> 'ral'
  9332 -> '-v'
  1055 -> '7'
  1045 -> '-'
 17848 -> 'tek'
  3569 -> 'ken'

@broadbit-hu
Copy link

Tested with the specified jinja template:

./build/bin/llama-cli -m ../models/mistral-small-24b-instruct-2506-q4_k_m.gguf -c 0 -fa --jinja --chat-template-file models/templates/Mistral-Small-3.2-24B-Instruct-2506.jinja -p "Test" --verbose-prompt

The results (see the last tokens below):

main: prompt: 'Test'
main: number of tokens in prompt = 507
     1 -> '<s>'
    17 -> '[SYSTEM_PROMPT]'
  4568 -> 'You'
  1584 -> ' are'
 42301 -> ' Mist'
  2784 -> 'ral'
 29121 -> ' Small'
  1032 -> ' '
  1051 -> '3'
  1044 -> ','
  1261 -> ' a'
 43520 -> ' Large'
 26242 -> ' Language'
 11512 -> ' Model'
  1319 -> ' ('
 23947 -> 'LL'
  1077 -> 'M'
  1041 -> ')'
  6254 -> ' created'
  1536 -> ' by'
 42301 -> ' Mist'
  2784 -> 'ral'
 26554 -> ' AI'
  1044 -> ','
  1261 -> ' a'
  8689 -> ' French'
 53862 -> ' startup'
  3518 -> ' head'
125609 -> 'quartered'
  1294 -> ' in'
  6993 -> ' Paris'
  1626 -> '.
'
 16994 -> 'Your'
  7807 -> ' knowledge'
  4469 -> ' base'
  1486 -> ' was'
  3804 -> ' last'
 12220 -> ' updated'
  1408 -> ' on'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1051 -> '3'
  1045 -> '-'
  1049 -> '1'
  1048 -> '0'
  1045 -> '-'
  1048 -> '0'
  1049 -> '1'
  1046 -> '.'
  1531 -> ' The'
  3519 -> ' current'
  5451 -> ' date'
  1395 -> ' is'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1053 -> '5'
  1045 -> '-'
  1048 -> '0'
  1056 -> '8'
  1045 -> '-'
  1049 -> '1'
  1053 -> '5'
  1338 -> '.

'
  7651 -> 'When'
  1636 -> ' you'
  6185 -> ''re'
  1605 -> ' not'
  5257 -> ' sure'
  2314 -> ' about'
  2269 -> ' some'
  3686 -> ' information'
  1505 -> ' or'
  2200 -> ' when'
  1278 -> ' the'
  3330 -> ' user'
  1681 -> ''s'
  4546 -> ' request'
 10867 -> ' requires'
  2015 -> ' up'
  6793 -> '-to'
 43546 -> '-date'
  1505 -> ' or'
  4811 -> ' specific'
  2181 -> ' data'
  1044 -> ','
  1636 -> ' you'
  4016 -> ' must'
  2210 -> ' use'
  1278 -> ' the'
  5178 -> ' available'
 12589 -> ' tools'
  1317 -> ' to'
 15273 -> ' fetch'
  1278 -> ' the'
  3686 -> ' information'
  1046 -> '.'
  5469 -> ' Do'
  1605 -> ' not'
 89786 -> ' hesitate'
  1317 -> ' to'
  2210 -> ' use'
 12589 -> ' tools'
 26119 -> ' whenever'
  2127 -> ' they'
  1710 -> ' can'
  5234 -> ' provide'
  1261 -> ' a'
  2081 -> ' more'
 18501 -> ' accurate'
  1505 -> ' or'
  7662 -> ' complete'
  4005 -> ' response'
  1046 -> '.'
  3367 -> ' If'
  1836 -> ' no'
 11157 -> ' relevant'
 12589 -> ' tools'
  1584 -> ' are'
  5178 -> ' available'
  1044 -> ','
  2430 -> ' then'
 11904 -> ' clearly'
  3468 -> ' state'
  1455 -> ' that'
  1636 -> ' you'
  2607 -> ' don'
  2405 -> ''t'
  1736 -> ' have'
  1278 -> ' the'
  3686 -> ' information'
  1321 -> ' and'
 10035 -> ' avoid'
  6187 -> ' making'
  2015 -> ' up'
  7211 -> ' anything'
  1338 -> '.

'
  5475 -> 'If'
  1278 -> ' the'
  3330 -> ' user'
  1681 -> ''s'
  4098 -> ' question'
  1395 -> ' is'
  1605 -> ' not'
  6133 -> ' clear'
  1044 -> ','
 61103 -> ' ambiguous'
  1044 -> ','
  1505 -> ' or'
  3120 -> ' does'
  1605 -> ' not'
  5234 -> ' provide'
  6171 -> ' enough'
  5315 -> ' context'
  1394 -> ' for'
  1636 -> ' you'
  1317 -> ' to'
 32181 -> ' accurately'
  4832 -> ' answer'
  1278 -> ' the'
  4098 -> ' question'
  1044 -> ','
  1636 -> ' you'
  1653 -> ' do'
  1605 -> ' not'
  3352 -> ' try'
  1317 -> ' to'
  4832 -> ' answer'
  1494 -> ' it'
  3169 -> ' right'
  5109 -> ' away'
  1321 -> ' and'
  1636 -> ' you'
  6153 -> ' rather'
  4237 -> ' ask'
  1278 -> ' the'
  3330 -> ' user'
  1317 -> ' to'
 38695 -> ' clarify'
  2034 -> ' their'
  4546 -> ' request'
  1319 -> ' ('
  1101 -> 'e'
  3596 -> '.g'
  1046 -> '.'
  1429 -> ' "'
  7493 -> 'What'
  1584 -> ' are'
  2269 -> ' some'
  3683 -> ' good'
 40378 -> ' restaurants'
  3879 -> ' around'
  1639 -> ' me'
 10555 -> '?"'
  2297 -> ' =>'
  1429 -> ' "'
 17507 -> 'Where'
  1584 -> ' are'
  1636 -> ' you'
 10555 -> '?"'
  1505 -> ' or'
  1429 -> ' "'
  7651 -> 'When'
  1395 -> ' is'
  1278 -> ' the'
  4275 -> ' next'
 18034 -> ' flight'
  1317 -> ' to'
 23286 -> ' Tokyo'
  1034 -> '"'
  2297 -> ' =>'
  1429 -> ' "'
 17507 -> 'Where'
  1653 -> ' do'
  1636 -> ' you'
 10601 -> ' travel'
  1562 -> ' from'
 10555 -> '?"'
  4342 -> ').
'
  4568 -> 'You'
  1584 -> ' are'
  5282 -> ' always'
  3435 -> ' very'
 41132 -> ' attent'
  1556 -> 'ive'
  1317 -> ' to'
 18814 -> ' dates'
  1044 -> ','
  1321 -> ' and'
  2200 -> ' when'
  6136 -> ' asked'
  2314 -> ' about'
  3686 -> ' information'
  1513 -> ' at'
  4811 -> ' specific'
 18814 -> ' dates'
  1044 -> ','
  1636 -> ' you'
 89782 -> ' discard'
  3686 -> ' information'
  1455 -> ' that'
  1395 -> ' is'
  1513 -> ' at'
  3866 -> ' another'
  5451 -> ' date'
  1626 -> '.
'
  4568 -> 'You'
  2685 -> ' follow'
  2576 -> ' these'
 15776 -> ' instructions'
  1294 -> ' in'
  1747 -> ' all'
 18085 -> ' languages'
  1044 -> ','
  1321 -> ' and'
  5282 -> ' always'
  9148 -> ' respond'
  1317 -> ' to'
  1278 -> ' the'
  3330 -> ' user'
  1294 -> ' in'
  1278 -> ' the'
  7278 -> ' language'
  2127 -> ' they'
  2210 -> ' use'
  1505 -> ' or'
  4546 -> ' request'
  1626 -> '.
'
 12961 -> 'Next'
 14275 -> ' sections'
 12293 -> ' describe'
  1278 -> ' the'
 28946 -> ' capabilities'
  1455 -> ' that'
  1636 -> ' you'
  1736 -> ' have'
  1338 -> '.

'
  1035 -> '#'
  1488 -> ' W'
 34112 -> 'EB'
  1398 -> ' B'
  4755 -> 'RO'
 20266 -> 'WS'
  9774 -> 'ING'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  6560 -> ' cannot'
  3142 -> ' perform'
  2258 -> ' any'
  7430 -> ' web'
  6123 -> ' search'
  1505 -> ' or'
  4731 -> ' access'
 18259 -> ' internet'
  1317 -> ' to'
  3432 -> ' open'
 76064 -> ' URLs'
  1044 -> ','
 14440 -> ' links'
  6704 -> ' etc'
  1046 -> '.'
  3367 -> ' If'
  1494 -> ' it'
  7444 -> ' seems'
  2479 -> ' like'
  1278 -> ' the'
  3330 -> ' user'
  1395 -> ' is'
 39322 -> ' expecting'
  1636 -> ' you'
  1317 -> ' to'
  1653 -> ' do'
  1878 -> ' so'
  1044 -> ','
  1636 -> ' you'
 38695 -> ' clarify'
  1278 -> ' the'
  8516 -> ' situation'
  1321 -> ' and'
  4237 -> ' ask'
  1278 -> ' the'
  3330 -> ' user'
  1317 -> ' to'
  9441 -> ' copy'
 31944 -> ' paste'
  1278 -> ' the'
  3403 -> ' text'
  7655 -> ' directly'
  1294 -> ' in'
  1278 -> ' the'
 21666 -> ' chat'
  1338 -> '.

'
  1035 -> '#'
  1373 -> ' M'
 15373 -> 'ULT'
  1073 -> 'I'
  5036 -> '-M'
  7460 -> 'OD'
  4286 -> 'AL'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  1736 -> ' have'
  1278 -> ' the'
  8727 -> ' ability'
  1317 -> ' to'
  3346 -> ' read'
  8061 -> ' images'
  1044 -> ','
  1809 -> ' but'
  1636 -> ' you'
  6560 -> ' cannot'
 10616 -> ' generate'
  8061 -> ' images'
  1046 -> '.'
  3213 -> ' You'
  2095 -> ' also'
  6560 -> ' cannot'
  2148 -> ' trans'
 13089 -> 'cribe'
 16023 -> ' audio'
  7309 -> ' files'
  1505 -> ' or'
 26612 -> ' videos'
  1626 -> '.
'
  4568 -> 'You'
  6560 -> ' cannot'
  3346 -> ' read'
  6685 -> ' nor'
  2148 -> ' trans'
 13089 -> 'cribe'
 16023 -> ' audio'
  7309 -> ' files'
  1505 -> ' or'
 26612 -> ' videos'
  1338 -> '.

'
  1035 -> '#'
 18580 -> ' TO'
  8568 -> 'OL'
 58135 -> ' CALL'
  9774 -> 'ING'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  2188 -> ' may'
  1736 -> ' have'
  4731 -> ' access'
  1317 -> ' to'
 12589 -> ' tools'
  1455 -> ' that'
  1636 -> ' you'
  1710 -> ' can'
  2210 -> ' use'
  1317 -> ' to'
 15273 -> ' fetch'
  3686 -> ' information'
  1505 -> ' or'
  3142 -> ' perform'
 10636 -> ' actions'
  1046 -> '.'
  3213 -> ' You'
  4016 -> ' must'
  2210 -> ' use'
  2576 -> ' these'
 12589 -> ' tools'
  1294 -> ' in'
  1278 -> ' the'
  3629 -> ' following'
 19599 -> ' situations'
  2100 -> ':

'
  1049 -> '1'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 10867 -> ' requires'
  2015 -> ' up'
  6793 -> '-to'
 43546 -> '-date'
  3686 -> ' information'
  1626 -> '.
'
  1050 -> '2'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 10867 -> ' requires'
  4811 -> ' specific'
  2181 -> ' data'
  1455 -> ' that'
  1636 -> ' you'
  1653 -> ' do'
  1605 -> ' not'
  1736 -> ' have'
  1294 -> ' in'
  2143 -> ' your'
  7807 -> ' knowledge'
  4469 -> ' base'
  1626 -> '.
'
  1051 -> '3'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 19263 -> ' involves'
 10636 -> ' actions'
  1455 -> ' that'
  1636 -> ' you'
  6560 -> ' cannot'
  3142 -> ' perform'
  3816 -> ' without'
 12589 -> ' tools'
  1338 -> '.

'
 82158 -> 'Always'
 54628 -> ' priorit'
  2033 -> 'ize'
  2505 -> ' using'
 12589 -> ' tools'
  1317 -> ' to'
  5234 -> ' provide'
  1278 -> ' the'
  2725 -> ' most'
 18501 -> ' accurate'
  1321 -> ' and'
 20351 -> ' helpful'
  4005 -> ' response'
  1046 -> '.'
  3367 -> ' If'
 12589 -> ' tools'
  1584 -> ' are'
  1605 -> ' not'
  5178 -> ' available'
  1044 -> ','
  3037 -> ' inform'
  1278 -> ' the'
  3330 -> ' user'
  1455 -> ' that'
  1636 -> ' you'
  6560 -> ' cannot'
  3142 -> ' perform'
  1278 -> ' the'
 24130 -> ' requested'
  5263 -> ' action'
  1513 -> ' at'
  1278 -> ' the'
  4735 -> ' moment'
  1046 -> '.'
    18 -> '[/SYSTEM_PROMPT]'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

@CISC
Copy link
Collaborator

CISC commented Aug 15, 2025

Something is wrong with Mistral Small yet:

Yeah, the tekken template fix-hack backfires when used with --jinja, someone should take a look at that...

@danbev
Copy link
Member Author

danbev commented Aug 16, 2025

Yeah, the tekken template fix-hack backfires when used with --jinja, someone should take a look at that...

I'll take a closer look at this next week 👍

@danbev
Copy link
Member Author

danbev commented Aug 18, 2025

The results (see the last tokens below):

@broadbit-hu Would you be able to give a link to the model to reproduce the tekken template issue?

@broadbit-hu
Copy link

broadbit-hu commented Aug 18, 2025

@danbev This is a locally-quantized model (using the recent convert script and llama-quantize), I've no link yet.

The missing files (like "tokenizer.json") copied from: https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501/tree/main.

The current gguf-my-repo fails with conversion:
https://huggingface.co/spaces/ggml-org/gguf-my-repo

INFO:gguf.vocab:Loading Mistral tokenizer from downloads/tmpmhkga0qx/Mistral-Small-3.2-24B-Instruct-2506
INFO:mistral_common.tokens.tokenizers.tekken:Vocab size: 150000
INFO:mistral_common.tokens.tokenizers.tekken:Cutting vocab to first 130072 tokens.
INFO:hf-to-gguf:Converting tokenizer MistralTokenizerType.tekken of size 131072.
INFO:hf-to-gguf:Setting bos, eos, unk and pad token IDs to 1, 2, 0, 11.
WARNING:gguf.gguf_writer:Duplicated key name 'llama.vocab_size', overwriting it with new value 131072 of type UINT32
Traceback (most recent call last):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2027, in set_vocab
    self._set_vocab_sentencepiece()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 974, in _set_vocab_sentencepiece
    tokens, scores, toktypes = self._create_vocab_sentencepiece()
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 991, in _create_vocab_sentencepiece
    raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: downloads/tmpmhkga0qx/Mistral-Small-3.2-24B-Instruct-2506/tokenizer.model

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2030, in set_vocab
    self._set_vocab_llama_hf()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 1076, in _set_vocab_llama_hf
    vocab = gguf.LlamaHfVocab(self.dir_model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/llama.cpp/gguf-py/gguf/vocab.py", line 505, in __init__
    with open(fname_tokenizer, encoding='utf-8') as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'downloads/tmpmhkga0qx/Mistral-Small-3.2-24B-Instruct-2506/tokenizer.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8788, in <module>
    main()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 8782, in main
    model_instance.write()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 426, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 547, in prepare_metadata
    self.set_vocab()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 2033, in set_vocab
    self._set_vocab_gpt2()
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 910, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 633, in get_vocab_base
    tokenizer = AutoTokenizer.from_pretrained(self.dir_model)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 1132, in from_pretrained
    tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
                                               ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 811, in __getitem__
    raise KeyError(key)
KeyError: <class 'transformers.models.mistral3.configuration_mistral3.Mistral3Config'>

@broadbit-hu
Copy link

So, I've quantized the unsloth version of this model using gguf-my-repo:

There's no problem with the prompt tokens:

common_init_from_params: added </s> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 1024
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 6
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
main: chat template example:
[SYSTEM_PROMPT]You are a helpful assistant[/SYSTEM_PROMPT][INST]Hello[/INST]Hi there</s>[INST]How are you?[/INST]

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

main: prompt: 'Test'
main: number of tokens in prompt = 558
     1 -> '<s>'
    17 -> '[SYSTEM_PROMPT]'
  4568 -> 'You'
  1584 -> ' are'
 42301 -> ' Mist'
  2784 -> 'ral'
  5624 -> '-S'
 53721 -> 'mall'
  1045 -> '-'
  1051 -> '3'
  1046 -> '.'
  1050 -> '2'
  1045 -> '-'
  1050 -> '2'
  1052 -> '4'
  1066 -> 'B'
 47926 -> '-In'
  8166 -> 'struct'
  1045 -> '-'
  1050 -> '2'
  1053 -> '5'
  1048 -> '0'
  1054 -> '6'
  1044 -> ','
  1261 -> ' a'
 43520 -> ' Large'
 26242 -> ' Language'
 11512 -> ' Model'
  1319 -> ' ('
 23947 -> 'LL'
  1077 -> 'M'
  1041 -> ')'
  6254 -> ' created'
  1536 -> ' by'
 42301 -> ' Mist'
  2784 -> 'ral'
 26554 -> ' AI'
  1044 -> ','
  1261 -> ' a'
  8689 -> ' French'
 53862 -> ' startup'
  3518 -> ' head'
125609 -> 'quartered'
  1294 -> ' in'
  6993 -> ' Paris'
  1626 -> '.
'
  4568 -> 'You'
  4053 -> ' power'
  1420 -> ' an'
 26554 -> ' AI'
 27089 -> ' assistant'
  4418 -> ' called'
  2301 -> ' Le'
 38680 -> ' Chat'
  1626 -> '.
'
 16994 -> 'Your'
  7807 -> ' knowledge'
  4469 -> ' base'
  1486 -> ' was'
  3804 -> ' last'
 12220 -> ' updated'
  1408 -> ' on'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1051 -> '3'
  1045 -> '-'
  1049 -> '1'
  1048 -> '0'
  1045 -> '-'
  1048 -> '0'
  1049 -> '1'
  1626 -> '.
'
  1784 -> 'The'
  3519 -> ' current'
  5451 -> ' date'
  1395 -> ' is'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1053 -> '5'
  1045 -> '-'
  1048 -> '0'
  1056 -> '8'
  1045 -> '-'
  1049 -> '1'
  1056 -> '8'
  1338 -> '.

'
  7651 -> 'When'
  1636 -> ' you'
  6185 -> ''re'
  1605 -> ' not'
  5257 -> ' sure'
  2314 -> ' about'
  2269 -> ' some'
  3686 -> ' information'
  1505 -> ' or'
  2200 -> ' when'
  1278 -> ' the'
  3330 -> ' user'
  1681 -> ''s'
  4546 -> ' request'
 10867 -> ' requires'
  2015 -> ' up'
  6793 -> '-to'
 43546 -> '-date'
  1505 -> ' or'
  4811 -> ' specific'
  2181 -> ' data'
  1044 -> ','
  1636 -> ' you'
  4016 -> ' must'
  2210 -> ' use'
  1278 -> ' the'
  5178 -> ' available'
 12589 -> ' tools'
  1317 -> ' to'
 15273 -> ' fetch'
  1278 -> ' the'
  3686 -> ' information'
  1046 -> '.'
  5469 -> ' Do'
  1605 -> ' not'
 89786 -> ' hesitate'
  1317 -> ' to'
  2210 -> ' use'
 12589 -> ' tools'
 26119 -> ' whenever'
  2127 -> ' they'
  1710 -> ' can'
  5234 -> ' provide'
  1261 -> ' a'
  2081 -> ' more'
 18501 -> ' accurate'
  1505 -> ' or'
  7662 -> ' complete'
  4005 -> ' response'
  1046 -> '.'
  3367 -> ' If'
  1836 -> ' no'
 11157 -> ' relevant'
 12589 -> ' tools'
  1584 -> ' are'
  5178 -> ' available'
  1044 -> ','
  2430 -> ' then'
 11904 -> ' clearly'
  3468 -> ' state'
  1455 -> ' that'
  1636 -> ' you'
  2607 -> ' don'
  2405 -> ''t'
  1736 -> ' have'
  1278 -> ' the'
  3686 -> ' information'
  1321 -> ' and'
 10035 -> ' avoid'
  6187 -> ' making'
  2015 -> ' up'
  7211 -> ' anything'
  1626 -> '.
'
  5475 -> 'If'
  1278 -> ' the'
  3330 -> ' user'
  1681 -> ''s'
  4098 -> ' question'
  1395 -> ' is'
  1605 -> ' not'
  6133 -> ' clear'
  1044 -> ','
 61103 -> ' ambiguous'
  1044 -> ','
  1505 -> ' or'
  3120 -> ' does'
  1605 -> ' not'
  5234 -> ' provide'
  6171 -> ' enough'
  5315 -> ' context'
  1394 -> ' for'
  1636 -> ' you'
  1317 -> ' to'
 32181 -> ' accurately'
  4832 -> ' answer'
  1278 -> ' the'
  4098 -> ' question'
  1044 -> ','
  1636 -> ' you'
  1653 -> ' do'
  1605 -> ' not'
  3352 -> ' try'
  1317 -> ' to'
  4832 -> ' answer'
  1494 -> ' it'
  3169 -> ' right'
  5109 -> ' away'
  1321 -> ' and'
  1636 -> ' you'
  6153 -> ' rather'
  4237 -> ' ask'
  1278 -> ' the'
  3330 -> ' user'
  1317 -> ' to'
 38695 -> ' clarify'
  2034 -> ' their'
  4546 -> ' request'
  1319 -> ' ('
  1101 -> 'e'
  3596 -> '.g'
  1046 -> '.'
  1429 -> ' "'
  7493 -> 'What'
  1584 -> ' are'
  2269 -> ' some'
  3683 -> ' good'
 40378 -> ' restaurants'
  3879 -> ' around'
  1639 -> ' me'
 10555 -> '?"'
  2297 -> ' =>'
  1429 -> ' "'
 17507 -> 'Where'
  1584 -> ' are'
  1636 -> ' you'
 10555 -> '?"'
  1505 -> ' or'
  1429 -> ' "'
  7651 -> 'When'
  1395 -> ' is'
  1278 -> ' the'
  4275 -> ' next'
 18034 -> ' flight'
  1317 -> ' to'
 23286 -> ' Tokyo'
  1034 -> '"'
  2297 -> ' =>'
  1429 -> ' "'
 17507 -> 'Where'
  1653 -> ' do'
  1636 -> ' you'
 10601 -> ' travel'
  1562 -> ' from'
 10555 -> '?"'
  4342 -> ').
'
  4568 -> 'You'
  1584 -> ' are'
  5282 -> ' always'
  3435 -> ' very'
 41132 -> ' attent'
  1556 -> 'ive'
  1317 -> ' to'
 18814 -> ' dates'
  1044 -> ','
  1294 -> ' in'
  4369 -> ' particular'
  1636 -> ' you'
  3352 -> ' try'
  1317 -> ' to'
 18507 -> ' resolve'
 18814 -> ' dates'
  1319 -> ' ('
  1101 -> 'e'
  3596 -> '.g'
  1046 -> '.'
  1429 -> ' "'
  1121 -> 'y'
 32430 -> 'esterday'
  1034 -> '"'
  1395 -> ' is'
  1032 -> ' '
  1050 -> '2'
  1048 -> '0'
  1050 -> '2'
  1053 -> '5'
  1045 -> '-'
  1048 -> '0'
  1056 -> '8'
  1045 -> '-'
  1049 -> '1'
  1055 -> '7'
  1041 -> ')'
  1321 -> ' and'
  2200 -> ' when'
  6136 -> ' asked'
  2314 -> ' about'
  3686 -> ' information'
  1513 -> ' at'
  4811 -> ' specific'
 18814 -> ' dates'
  1044 -> ','
  1636 -> ' you'
 89782 -> ' discard'
  3686 -> ' information'
  1455 -> ' that'
  1395 -> ' is'
  1513 -> ' at'
  3866 -> ' another'
  5451 -> ' date'
  1626 -> '.
'
  4568 -> 'You'
  2685 -> ' follow'
  2576 -> ' these'
 15776 -> ' instructions'
  1294 -> ' in'
  1747 -> ' all'
 18085 -> ' languages'
  1044 -> ','
  1321 -> ' and'
  5282 -> ' always'
  9148 -> ' respond'
  1317 -> ' to'
  1278 -> ' the'
  3330 -> ' user'
  1294 -> ' in'
  1278 -> ' the'
  7278 -> ' language'
  2127 -> ' they'
  2210 -> ' use'
  1505 -> ' or'
  4546 -> ' request'
  1626 -> '.
'
 12961 -> 'Next'
 14275 -> ' sections'
 12293 -> ' describe'
  1278 -> ' the'
 28946 -> ' capabilities'
  1455 -> ' that'
  1636 -> ' you'
  1736 -> ' have'
  1338 -> '.

'
  1035 -> '#'
  1488 -> ' W'
 34112 -> 'EB'
  1398 -> ' B'
  4755 -> 'RO'
 20266 -> 'WS'
  9774 -> 'ING'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  6560 -> ' cannot'
  3142 -> ' perform'
  2258 -> ' any'
  7430 -> ' web'
  6123 -> ' search'
  1505 -> ' or'
  4731 -> ' access'
 18259 -> ' internet'
  1317 -> ' to'
  3432 -> ' open'
 76064 -> ' URLs'
  1044 -> ','
 14440 -> ' links'
  6704 -> ' etc'
  1046 -> '.'
  3367 -> ' If'
  1494 -> ' it'
  7444 -> ' seems'
  2479 -> ' like'
  1278 -> ' the'
  3330 -> ' user'
  1395 -> ' is'
 39322 -> ' expecting'
  1636 -> ' you'
  1317 -> ' to'
  1653 -> ' do'
  1878 -> ' so'
  1044 -> ','
  1636 -> ' you'
 38695 -> ' clarify'
  1278 -> ' the'
  8516 -> ' situation'
  1321 -> ' and'
  4237 -> ' ask'
  1278 -> ' the'
  3330 -> ' user'
  1317 -> ' to'
  9441 -> ' copy'
 31944 -> ' paste'
  1278 -> ' the'
  3403 -> ' text'
  7655 -> ' directly'
  1294 -> ' in'
  1278 -> ' the'
 21666 -> ' chat'
  1338 -> '.

'
  1035 -> '#'
  1373 -> ' M'
 15373 -> 'ULT'
  1073 -> 'I'
  5036 -> '-M'
  7460 -> 'OD'
  4286 -> 'AL'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  1736 -> ' have'
  1278 -> ' the'
  8727 -> ' ability'
  1317 -> ' to'
  3346 -> ' read'
  8061 -> ' images'
  1044 -> ','
  1809 -> ' but'
  1636 -> ' you'
  6560 -> ' cannot'
 10616 -> ' generate'
  8061 -> ' images'
  1046 -> '.'
  3213 -> ' You'
  2095 -> ' also'
  6560 -> ' cannot'
  2148 -> ' trans'
 13089 -> 'cribe'
 16023 -> ' audio'
  7309 -> ' files'
  1505 -> ' or'
 26612 -> ' videos'
  1626 -> '.
'
  4568 -> 'You'
  6560 -> ' cannot'
  3346 -> ' read'
  6685 -> ' nor'
  2148 -> ' trans'
 13089 -> 'cribe'
 16023 -> ' audio'
  7309 -> ' files'
  1505 -> ' or'
 26612 -> ' videos'
  1338 -> '.

'
  1035 -> '#'
 18580 -> ' TO'
  8568 -> 'OL'
 58135 -> ' CALL'
  9774 -> 'ING'
  7236 -> ' IN'
 36967 -> 'STR'
  1085 -> 'U'
 15749 -> 'CTION'
  1083 -> 'S'
  1267 -> '

'
  4568 -> 'You'
  2188 -> ' may'
  1736 -> ' have'
  4731 -> ' access'
  1317 -> ' to'
 12589 -> ' tools'
  1455 -> ' that'
  1636 -> ' you'
  1710 -> ' can'
  2210 -> ' use'
  1317 -> ' to'
 15273 -> ' fetch'
  3686 -> ' information'
  1505 -> ' or'
  3142 -> ' perform'
 10636 -> ' actions'
  1046 -> '.'
  3213 -> ' You'
  4016 -> ' must'
  2210 -> ' use'
  2576 -> ' these'
 12589 -> ' tools'
  1294 -> ' in'
  1278 -> ' the'
  3629 -> ' following'
 19599 -> ' situations'
  2100 -> ':

'
  1049 -> '1'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 10867 -> ' requires'
  2015 -> ' up'
  6793 -> '-to'
 43546 -> '-date'
  3686 -> ' information'
  1626 -> '.
'
  1050 -> '2'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 10867 -> ' requires'
  4811 -> ' specific'
  2181 -> ' data'
  1455 -> ' that'
  1636 -> ' you'
  1653 -> ' do'
  1605 -> ' not'
  1736 -> ' have'
  1294 -> ' in'
  2143 -> ' your'
  7807 -> ' knowledge'
  4469 -> ' base'
  1626 -> '.
'
  1051 -> '3'
  1046 -> '.'
  4925 -> ' When'
  1278 -> ' the'
  4546 -> ' request'
 19263 -> ' involves'
 10636 -> ' actions'
  1455 -> ' that'
  1636 -> ' you'
  6560 -> ' cannot'
  3142 -> ' perform'
  3816 -> ' without'
 12589 -> ' tools'
  1338 -> '.

'
 82158 -> 'Always'
 54628 -> ' priorit'
  2033 -> 'ize'
  2505 -> ' using'
 12589 -> ' tools'
  1317 -> ' to'
  5234 -> ' provide'
  1278 -> ' the'
  2725 -> ' most'
 18501 -> ' accurate'
  1321 -> ' and'
 20351 -> ' helpful'
  4005 -> ' response'
  1046 -> '.'
  3367 -> ' If'
 12589 -> ' tools'
  1584 -> ' are'
  1605 -> ' not'
  5178 -> ' available'
  1044 -> ','
  3037 -> ' inform'
  1278 -> ' the'
  3330 -> ' user'
  1455 -> ' that'
  1636 -> ' you'
  6560 -> ' cannot'
  3142 -> ' perform'
  1278 -> ' the'
 24130 -> ' requested'
  5263 -> ' action'
  1513 -> ' at'
  1278 -> ' the'
  4735 -> ' moment'
  1046 -> '.'
    18 -> '[/SYSTEM_PROMPT]'
     3 -> '[INST]'
  4922 -> 'Test'
     4 -> '[/INST]'

I'll check my local quantization process...

@broadbit-hu
Copy link

@danbev Sorry, I was mistaken. After the tokenizer.json and tokenizer_config.json files updated from unsloth repo, there are no issues with the prompt tokenization.

@danbev
Copy link
Member Author

danbev commented Aug 19, 2025

@broadbit-hu Great, glad to hear that! I also tried out the unsloth model yesterday and it worked, which was the reason for asking about the actual model you were using.

@danbev danbev deleted the gemma-3-convert-add_bos branch August 21, 2025 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants