Skip to content

Conversation

@matteoserva
Copy link
Contributor

Reopening #13099 because I broke my repo with a wrong git command and it automatically closed the PR.

@ngxson

  • I moved the check for GLM4 above so it gets higher priority.

Instead of changing BOS, I think what should be done is to set add_add_bos(False) to prevent tokenize function from adding BOS to the sentence

I am not sure if I can do that. add_bos_token is automatically set to false by the HF model. I am forced to set it to PAD token because otherwise it would break the jinja templating system ./llama-server --jinja.
The jinja code ignores the add_bos_token value and removes any BOS token it encounters.

This is the relevant code block.

if (string_starts_with(result, tmpl.bos_token())) {

If I don't set the BOS token it would break the /props endpoint

ORIGINAL:

GLM4-0414 models were using the wrong legacy template, leading to a missing `[gMASK]<sop>` preamble.
The old code was returning `LLM_CHAT_TEMPLATE_GLMEDGE`

As a workaround you needed to launch `llama-server` with `--chat-template chatglm4`

After the patch the gguf should be regenerated or edited manually.

@github-actions github-actions bot added the python python script changes label Apr 27, 2025
special_vocab._set_special_token("eot", tokenizer.get_added_vocab()["<|user|>"])
special_vocab._set_special_token("unk", tokenizer.get_added_vocab()["<|endoftext|>"])
special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["[gMASK]"])
special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["<|endoftext|>"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can do that.

  • add_bos_token is already set to False by the HF model and conversion script.
  • The jinja templating system always removes the bos token, ignoring add_bos_token. So the prefix is removed and never added again.
  • If I remove the bos token altogether, the /props endpoint crashes because it expects it

This is the relevant jinja template code. reached when ./llama-server --jinja

if (string_starts_with(result, tmpl.bos_token())) {

Copy link
Collaborator

@ngxson ngxson Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the BOS is not added, then what is the reason behind changing it from [gMASK] to <|endoftext|>? it shouldn't crash /props, right?

Nvm I see what you mean

@ngxson ngxson merged commit ced44be into ggml-org:master Apr 27, 2025
50 checks passed
@Mushoz
Copy link

Mushoz commented Apr 27, 2025

Do GGUFs have to be regenerated for these fixes to apply, or can I continue using the GGUF I already have?

@matteoserva
Copy link
Contributor Author

Do GGUFs have to be regenerated for these fixes to apply, or can I continue using the GGUF I already have?

The GGUF should be regenerated or edited manually.

pockers21 pushed a commit to pockers21/llama.cpp that referenced this pull request Apr 28, 2025
* fix wrong template in GLM4-0414

* fix spaces

* no bos token since it is already in the template

* moved the chatgml4 check to higher priority

* restored template for old GLM models

* moved the GLM4 template check in the correct place with correct check
@Dampfinchen
Copy link

Hello! Sorry, noob here. What changes would I have to make to my existing GGUF? I've looked at the template and I'm just confused.

@matteoserva
Copy link
Contributor Author

Hello! Sorry, noob here. What changes would I have to make to my existing GGUF? I've looked at the template and I'm just confused.

you only have to replace the bos token kv parameter from [gMask] to <|endoftext|> in the GGUF, then use a updated llama.cpp

@Ph0rk0z
Copy link

Ph0rk0z commented Apr 30, 2025

kv parameter? Do you mean edit the bos token ID in the metadata? or somewhere else. And I assume this means I was testing the model with bos token listed twice, even in text completion?

I wish you could simply choose to add the bos or not in text completions from the front end. I'm under the impression that it's added anyway.

@matteoserva
Copy link
Contributor Author

kv parameter? Do you mean edit the bos token ID in the metadata? or somewhere else. And I assume this means I was testing the model with bos token listed twice, even in text completion?

I wish you could simply choose to add the bos or not in text completions from the front end. I'm under the impression that it's added anyway.

The command to enable/disable the bos token is --override-kv tokenizer.ggml.add_bos_token=bool:false.
In GLM the value add_bos_token is not set and it defaults to false.

The problem is that the jinja templating system ignores add_bos_token and breaks the prompt. The fix for a old GGUF is simply to change the bos token (tokenizer.ggml.bos_token_id) from [gMASK] to <|endoftext|> and run it with an updated llama.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants