Skip to content

Commit 4c64211

Browse files
committed
chat : Avoid partial reasoning tags in response content
If a model uses a multi-part reasoning tag we can end up with part of the tag in the message content when using streaming mode. E.g. $ curl -N http://localhost:8080/v1/chat/completions -d '{ "model": "hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl", "messages": [ {"role": "user", "content": "Hello, how are you?"} ], "stream": true }' -H "Content-Type: application/json" data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<|channel|>"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"analysis"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"The"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} data: {"choices":[{"finish_reason":null,"index":0,"delta":{}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} ... This happens because the chat parser can't make a full match on the first parts of the reasoning tag. So, modify try_consume_literal() to speculatively consume a partially matching string in case the parser is constructed with partial set to true. Signed-off-by: Piotr Stankiewicz <[email protected]>
1 parent 1d72c84 commit 4c64211

File tree

2 files changed

+15
-0
lines changed

2 files changed

+15
-0
lines changed

common/chat-parser.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,13 @@ bool common_chat_msg_parser::try_consume_literal(const std::string & literal) {
9595
auto pos = pos_;
9696
for (auto i = 0u; i < literal.size(); ++i) {
9797
if (pos >= input_.size()) {
98+
if (is_partial() && i > 0) {
99+
// For partial message, whose suffix matches the literal, report
100+
// that it can be consumed. We need more content to be able to
101+
// tell otherwise.
102+
pos_ = pos;
103+
return true;
104+
}
98105
return false;
99106
}
100107
if (input_[pos] != literal[i]) {

tests/test-chat-parser.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,14 @@ static void test_regex() {
150150
common_chat_msg_parser builder("Hello,", is_partial, {});
151151
assert_equals(false, builder.try_consume_literal("Oh"));
152152
}
153+
{
154+
common_chat_msg_parser builder("<some>", false, {});
155+
assert_equals(false, builder.try_consume_literal("<some><suffix>"));
156+
}
157+
{
158+
common_chat_msg_parser builder("<some>", true, {});
159+
assert_equals(true, builder.try_consume_literal("<some><suffix>"));
160+
}
153161
}
154162

155163
const std::vector<std::string> barely_healable_jsons = {

0 commit comments

Comments
 (0)