-
Notifications
You must be signed in to change notification settings - Fork 12.7k
GPT-OSS: parse commentary tool calls; handle glued 'json'; add unit tests (#15102) #15158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…x; add unit tests; tidy comments
In non-tool calls, this seems to cause the final channel to be merged into the analysis channel? Unless I'm misunderstanding something. For instance, a response of: |
@@ -1319,9 +1319,55 @@ static common_chat_params common_chat_params_init_gpt_oss(const common_chat_temp | |||
return data; | |||
} | |||
static void common_chat_parse_gpt_oss(common_chat_msg_parser & builder) { | |||
// TODO @ngxson : this won't work with --special enabled, we should fix that | |||
builder.try_parse_reasoning("<|channel|>analysis<|message|>", "<|start|>assistant<|channel|>final<|message|>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing try_parse_reasoning would lose the ability to capture reasoning to reasoning_content field in the response. For gpt-oss, reasoning is the message from analysis
channel.
} | ||
builder.add_content(builder.consume_rest()); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition block should be kept as is. When builder.syntex().parse_tool_calls is false, we just consume_rest without trying to parse tools. Move the tool processing after this block.
I'm one of the people from #15102, this PR seems to fix the crash, and work a bit better with the tool calling, but doesn't fully work. I made some additional changes which seems to enable full tool usage, and proper reasoning_content/content split, but I'm not 100% sure if it's the correct fixes I made, I'm very new to cpp development :) These are the changes I ended up doing: diff --git a/common/chat.cpp b/common/chat.cpp
index 2e17f3e6..5a9b8041 100644
--- a/common/chat.cpp
+++ b/common/chat.cpp
@@ -1322,8 +1322,21 @@ static void common_chat_parse_gpt_oss(common_chat_msg_parser & builder) {
static const common_regex next_block(
"<\\|start\\|>assistant<\\|channel\\|>(?:commentary\\s+to=functions\\.([^\\s<\\|]+)(?:\\s+json)?|final)<\\|message\\|>");
+ auto handle_prelude = [&](const std::string & prelude) {
+ const std::string analysis_tag = "<|channel|>analysis<|message|>";
+ auto pos = prelude.find(analysis_tag);
+ if (pos != std::string::npos) {
+ auto reasoning = prelude.substr(pos + analysis_tag.size());
+ auto stripped = string_strip(reasoning);
+ if (!stripped.empty()) {
+ builder.add_reasoning_content(stripped);
+ }
+ }
+ };
+
if (!builder.syntax().parse_tool_calls) {
- if (auto res = builder.try_find_regex(next_block)) {
+ if (auto res = builder.try_find_regex(next_block, std::string::npos, /* add_prelude_to_content = */ false)) {
+ handle_prelude(res->prelude);
while (true) {
std::string fname = builder.str(res->groups[1]);
const std::string header = builder.str(res->groups[0]);
@@ -1336,24 +1349,29 @@ static void common_chat_parse_gpt_oss(common_chat_msg_parser & builder) {
if (!builder.try_consume_json_with_dumped_args({{}})) {
break;
}
- res = builder.try_find_regex(next_block);
+ res = builder.try_find_regex(next_block, std::string::npos, /* add_prelude_to_content = */ false);
if (!res) break;
+ handle_prelude(res->prelude);
continue;
}
builder.add_content(builder.consume_rest());
return;
}
}
- builder.add_content(builder.consume_rest());
+ // No more headers; treat the rest as potential analysis
+ auto rest = builder.consume_rest();
+ handle_prelude(rest);
return;
}
while (true) {
- auto res = builder.try_find_regex(next_block);
+ auto res = builder.try_find_regex(next_block, std::string::npos, /* add_prelude_to_content = */ false);
if (!res) {
- builder.add_content(builder.consume_rest());
+ auto rest = builder.consume_rest();
+ handle_prelude(rest);
return;
}
+ handle_prelude(res->prelude);
std::string fname = builder.str(res->groups[1]);
const std::string header = builder.str(res->groups[0]);
if (fname.size() > 4 && header.find(" json<|message|>") == std::string::npos) { If I run this PR as-is, I get the following output:
While with the diff from above applied on top of the code from this PR, I get:
Notice the last message's reasoning_content vs content, and it's now properly putting content coming from the Otherwise, this PR does fix the whole Commit is available here in case someone wants to integrate the changes or otherwise continue validating that tool calling and the whole channel splitting works correctly: https://github.com/victorb/llama.cpp/commits/fix/gpt-oss-tool-calls-parser/ (As a side note, even without my patch, a bunch of unit tests related to parsing inputs of other non-GPT-OSS models seems to start failing with the changes from this PR, not sure how or why) |
This is what I saw too, and what my patch above is trying to address. It seems to be working fine for me now (until I find a new issue I suppose...), would be great if you could try to apply that patch on top of the changes from this PR (or use |
Yes, this seems much better. Biggest problem I have now is just getting the 20B to properly call a tool, but when it does it seems to work correctly after your changes. |
@victorb 👍👍👍 |
Getting behavior like this: Actual (malformed): |
There is another PR as well, for the same fixes but done in a slightly different way, in case people wanna help out testing this too: #15181 |
Can confirm this has behavior that is closer to being correct(or is correct). Using this pr, plus the latest unsloth chat template, plus a custom proxy I have gotten these models to work with claude code :) |
Upon further testing, its much better, but inconsistent. It's hard to determine whether the issue is with the template, the PR, or my proxy. |
whoops. Yeah I commented on the wrong one. But yeah it was a me issue. This works great with Qwen code cli |
Summary
message|>{...}
o space before json by trimming the “json” suffix from the function name.
Fixes
uest and GPT-OSS emits commentary headers (including “glued json” cases). Ensure
s structured tool_calls are returned and final content is parsed correctly.
Behavior before
metimes leading to runtime errors or empty messages (e.g., when “json” was glued
to the function name).
Behavior after
and final channel content is captured as assistant content. “Glued json” is tole
rated.
Scope and impact
Implementation details
stripped from the function name.
ant content.
Testing
standard and glued “json” headers.
model eval). No changes to llama-perplexity or llama-bench expected.
Compatibility and risks
stent with existing parsing behavior).
assistant content.
Checklist
yes.
r-only).
n.
Module