Skip to content

Conversation

@aldehir
Copy link
Collaborator

@aldehir aldehir commented Aug 11, 2025

Problem: the current implementation generates <|channel|>analysis<|message|>... in the output and clients send it back verbatim. This causes an exception in the official gpt-oss jinja chat templates.

This PR parses out the reasoning and does one of the following:

  • reasoning_format = auto - send it in the reasoning_content field.
  • reasoning_format = none - wrap it in <think></think>, sent to the content field to avoid exceptions.

Additionally, it parses out any final channels. It does not yet support tool use. More comprehensive parsing is being worked on #15181.

@ggerganov @ngxson

Comment on lines -2342 to -2348
// @ngxson : quick hack for gpt-oss, always render these tokens
for (const auto & t : token_to_id) {
if (t.first == "<|channel|>" || t.first == "<|message|>" || t.first == "<|start|>") {
id_to_token[t.second].attr = LLAMA_TOKEN_ATTR_USER_DEFINED;
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure about removing this? It will prevent rendering these token without --special

Copy link
Collaborator Author

@aldehir aldehir Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am admittedly new to the code base, however for the web server it seems placing those tokens in preserved_tokens is sufficient to make them render.

I tested it with llama-cli and I see now it does omit them there. I will revert it.

This comment was marked as outdated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this, I think <|constrain|> and <|end|> should also be added in the condition

@ggerganov
Copy link
Member

After 6d75412, the llama-cli now crashes after the first message.

@ngxson
Copy link
Collaborator

ngxson commented Aug 11, 2025

After 6d75412, the llama-cli now crashes after the first message.

I remembered the new jinja template has a check to prevent certain tags from being in the input text. Not sure what's the best way to fix, maybe we need a patch/hotfix for that.

@aldehir
Copy link
Collaborator Author

aldehir commented Aug 11, 2025

Yes, I am seeing that as well. It's the exception thrown from the jinja template:

You have passed a message containing <|channel|> tags in the content field. Instead of doing this, you should pass analysis messages...

I can revert the last commit as a temporary workaround. It seems to work with --special in f058384 as well.

@aldehir
Copy link
Collaborator Author

aldehir commented Aug 11, 2025

At least for now, I'm going to keep it as I originally had it since it is somewhat usable. I haven't explored the CLI enough to have any good input.

@aldehir

This comment has been minimized.

@ngxson
Copy link
Collaborator

ngxson commented Aug 11, 2025

IIRC chat.cpp also allow patching the jinja template, see an example in common_chat_params_init_deepseek_r1

We can temporary patch the jinja version of harmony, so that it doesn't throw an error. Then later on we can spend more time to do a proper fix

@ngxson
Copy link
Collaborator

ngxson commented Aug 11, 2025

Also for visiblity, I think probably we don't need to replace the reasoning tags to <think></think> as introduced in this PR. With the migration of the new frontend to Svelte, we will eventually support reasoning_content field which is be much cleaner. We plan to release this version this weekend, so let's hold off this PR a bit.

In the meantime, having tool call support (on your other PR) is a very good feature.

@ggerganov
Copy link
Member

@ngxson I would like to have a the WebUI chat experience fixed quickly to the state that was working before the jinja template updates. Do you have a suggestion for a fix? I can also revery the GGUF models back to the old template?

@ngxson
Copy link
Collaborator

ngxson commented Aug 11, 2025

@ggerganov yes I can try to patch out the exception in jinja template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants