Skip to content

Bug: Qwen2.5-Coder variants do not properly stop in FIM mode #9606

@tristandruyen

Description

@tristandruyen

What happened?

Problem

When using the Qwen2.5-Coder variants in FIM mode, llama.cpp continues outputting tokens despite 151643 '<|endoftext|>' being encountered.

The cause of this seems to be that in the tokenizer_config.json, only the 151645 '<|im_end|>' stop token is provided which is used in instruct mode.

The official huggingface config is not entirely consistent on this as config.json, provides 151643 '<|endoftext|>' as eos token id, while tokenizer_config.json provides 151645 '<|im_end|>'.

BUT, not stopping on 151645 '<|im_end|>' breaks instruct mode, so it is not as simple as just updating the eos_token_id, ideally we should stop on both.

I was able to fix this, while keeping instruct mode working, by manually providing the eot token to gguf_new_metadata.py.

This makes both instruct and FIM mode work.

Quickfix

python3 ./gguf-py/scripts/gguf_new_metadata.py --special-token-by-id eot 151643 ./models/Qwen2.5-Coder-7B-Instruct-Q6_K_L.gguf ./models/Qwen2.5-Coder-7B-Instruct-Q6_K_L.fixed.gguf

Proper Solution ?

This should probably be fixed in the upstream configs. But It might be useful to provide a workaround, expecially as there are already lots of GGUF's without this fix.

Sadly I have no idea how tokenizer_config.json's work, I tried the below diff, but it had no effect when applying it with:

python3 gguf-py/scripts/gguf_new_metadata.py --chat-template-config updated_tokenizer_config.json ./Qwen2.5-Coder-7B-Instruct-Q6_K_L.gguf ./Qwen2.5-Coder-7B-Instruct-Q6_K_L.fixed.gguf
diff --git a/tokenizer_config.json b/tokenizer_config.json
index 07bfe06..993c314 100644
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -198,6 +198,7 @@
   "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
   "clean_up_tokenization_spaces": false,
   "eos_token": "<|im_end|>",
+  "eot_token": "<|endoftext|>",
   "errors": "replace",
   "model_max_length": 131072,
   "pad_token": "<|endoftext|>",

Name and Version

b3805

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions