-
Notifications
You must be signed in to change notification settings - Fork 13.7k
deepseek r1 series debug log warning fix and chat template support #11994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/llama-chat.cpp
Outdated
| } | ||
| } | ||
| if (add_ass) { | ||
| ss << LU8("<|Assistant|><think>\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break a lot of down stream applications where they expect <think> token to be included in the response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also feel confused why the <think> token in the final generated promt in its chat template:
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B/blob/main/tokenizer_config.json
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|><think>\\n'}}{% endif %}"
|
R1 uses the same template as V3 so this PR is may not needed at all |
Yes, my modify is not accurate. I am using the distill model from deepseek r1 and using the chat template frome here: and its chat template is not same with its base model's chat template: so, I think i should change tag to "deepseek-r1-distill-qwen" |
959dfe5 to
d491bd7
Compare
537e157 to
b8ff6f4
Compare
…_ids - the tokenizer config may be incorrect","'<|end▁of▁sentence|>' is not marked as EOG","'<|EOT|>' is not marked as EOG"
|
I'm closing this because you cannot tell me why this PR is needed. Adding this changes nothing, while making the code more complicated. Unless you can prove that there is an improvement, I will reopen this. |
Hi, i dont have the envirmnent to test deepseek r1. For the chat template, Deepseek r1 distill series always add bos_token before promt: but deepseek r1 does not, so i made a commit for more accurate: Thanks again for reivew. |
No description provided.