-
Notifications
You must be signed in to change notification settings - Fork 13.7k
model : gpt-oss simulate think tags when reasoning #15230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // @ngxson : quick hack for gpt-oss, always render these tokens | ||
| for (const auto & t : token_to_id) { | ||
| if (t.first == "<|channel|>" || t.first == "<|message|>" || t.first == "<|start|>") { | ||
| id_to_token[t.second].attr = LLAMA_TOKEN_ATTR_USER_DEFINED; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure about removing this? It will prevent rendering these token without --special
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am admittedly new to the code base, however for the web server it seems placing those tokens in preserved_tokens is sufficient to make them render.
I tested it with llama-cli and I see now it does omit them there. I will revert it.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to this, I think <|constrain|> and <|end|> should also be added in the condition
|
After 6d75412, the |
I remembered the new jinja template has a check to prevent certain tags from being in the input text. Not sure what's the best way to fix, maybe we need a patch/hotfix for that. |
|
Yes, I am seeing that as well. It's the exception thrown from the jinja template: I can revert the last commit as a temporary workaround. It seems to work with |
This reverts commit 6d75412.
|
At least for now, I'm going to keep it as I originally had it since it is somewhat usable. I haven't explored the CLI enough to have any good input. |
This comment has been minimized.
This comment has been minimized.
|
IIRC We can temporary patch the jinja version of harmony, so that it doesn't throw an error. Then later on we can spend more time to do a proper fix |
|
Also for visiblity, I think probably we don't need to replace the reasoning tags to In the meantime, having tool call support (on your other PR) is a very good feature. |
|
@ngxson I would like to have a the WebUI chat experience fixed quickly to the state that was working before the jinja template updates. Do you have a suggestion for a fix? I can also revery the GGUF models back to the old template? |
|
@ggerganov yes I can try to patch out the exception in jinja template |
Problem: the current implementation generates
<|channel|>analysis<|message|>...in the output and clients send it back verbatim. This causes an exception in the official gpt-oss jinja chat templates.This PR parses out the reasoning and does one of the following:
reasoning_format = auto- send it in thereasoning_contentfield.reasoning_format = none- wrap it in<think></think>, sent to thecontentfield to avoid exceptions.Additionally, it parses out any
finalchannels. It does not yet support tool use. More comprehensive parsing is being worked on #15181.@ggerganov @ngxson