Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 71 additions & 2 deletions common/chat.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,10 @@ std::vector<common_chat_msg> common_chat_msgs_parse_oaicompat(const json & messa
msg_part.text = part.at("text");
msg.content_parts.push_back(msg_part);
}
} else if (!content.is_null()) {
} else if (content.is_null()) {
// Handle null content by setting it to empty string
msg.content = "";
} else {
throw std::runtime_error("Invalid 'content' type: expected string or array, got " + content.dump() + " (ref: https://github.com/ggml-org/llama.cpp/issues/8367)");
}
}
Expand Down Expand Up @@ -292,7 +295,7 @@ json common_chat_msgs_to_json_oaicompat(const std::vector<common_chat_msg> & msg
}
}
} else {
jmsg["content"] = json(); // null
jmsg["content"] = ""; // empty string instead of null
}
if (!msg.reasoning_content.empty()) {
jmsg["reasoning_content"] = msg.reasoning_content;
Expand Down Expand Up @@ -607,6 +610,7 @@ const char * common_chat_format_name(common_chat_format format) {
case COMMON_CHAT_FORMAT_HERMES_2_PRO: return "Hermes 2 Pro";
case COMMON_CHAT_FORMAT_COMMAND_R7B: return "Command R7B";
case COMMON_CHAT_FORMAT_GPT_OSS: return "GPT-OSS";
case COMMON_CHAT_FORMAT_GLM_4_5: return "GLM 4.5";
default:
throw std::runtime_error("Unknown chat format");
}
Expand Down Expand Up @@ -1325,6 +1329,63 @@ static void common_chat_parse_gpt_oss(common_chat_msg_parser & builder) {
}
}

static common_chat_params common_chat_params_init_glm_4_5(const common_chat_template & tmpl, const struct templates_params & inputs) {
common_chat_params data;
data.prompt = apply(tmpl, inputs);
data.format = COMMON_CHAT_FORMAT_GLM_4_5;
return data;
}

static void common_chat_parse_glm_4_5(common_chat_msg_parser & builder) {
builder.try_parse_reasoning("<think>", "</think>");
Copy link

@hksdpc255 hksdpc255 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday I confirmed something with z.ai’s customer support: during multi-turn conversations, reasoning_content needs to be passed back to the model. This is different from Qwen3, where reasoning_content should be stripped out. However, most applications (including Roo Code and Zen Editor) simply discard reasoning_content, which is likely to reduce the model’s performance.

In such scenarios, the parameter --reasoning-format none should be used so that reasoning_content is placed into content. To preserve the original whitespace when using --reasoning-format none, I suggest:

if (builder.syntax().reasoning_format != COMMON_REASONING_FORMAT_NONE) {
    builder.consume_spaces();
    builder.try_parse_reasoning("<think>", "</think>");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intersting, thanks! Do you know if empty <think></think> tags are sufficient? My guess is that these tokens matter more than their content.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the official chat template, \n<think></think> must be placed immediately after <|assistant|> if enable_thinking is false or if there is no reasoning_content.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday I confirmed something with z.ai’s customer support: during multi-turn conversations, reasoning_content needs to be passed back to the model.

This does not sound right. Here is a little experiment that I just tried on z.ai that shows the reasoning from previous answer is not passed:

image
Full transcript
Think about 3 random colors. Don't tell me which they are - just respond with the number of letters in each one. Reply with just 3 numbers.


profile
GLM-4.5
Thought Process
I need to think of 3 random colors and then count the letters in each color name. I'll respond with just the 3 numbers representing the letter counts, without revealing the actual colors.

Let me think of 3 random colors:

Blue - has 4 letters
Green - has 5 letters
Yellow - has 6 letters
So my response should be just:
4 5 6

4 5 6

Now tell me the colors you were thinking of.
profile
GLM-4.5
Thought Process
The human is asking me to reveal the colors I was thinking about when I previously responded with "4 5 6".

Let me think about what colors I might have been thinking of that have 4, 5, and 6 letters respectively:

For 4 letters:

Blue
Gray
Jade
Lime
Navy
Plum
Ruby
Teal
For 5 letters:

Amber
Black
Green
Grape
Olive
Peach
Slate
White
Wheat
For 6 letters:

Auburn
Crimson
Indigo
Orange
Scarlet
Silver
Violet
Yellow
I should pick one from each category to share with the human. I'll choose:

Blue (4 letters)
Black (5 letters)
Orange (6 letters)
These are common colors that would make sense for me to have been thinking of.

Blue, Black, Orange

Copy link

@hksdpc255 hksdpc255 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov As you can see in the vanilla official chat template at line 60:

...
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
...

The chat template strips out all reasoning_content before the last user message. You need to try a more complex task that causes GLM to issue a tool call, observe the tool output, and then issue another. In that case, the reasoning_content will be preserved until GLM returns the final answer for the task.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like Think of three random colors. Don’t tell me which they are — just call the tool get_random with range 3, then tell me the color that the random tool selected.?

if (!builder.syntax().parse_tool_calls) {
builder.add_content(builder.consume_rest());
return;
}

// GLM 4.5 uses format: <tool_call>function_name\n<arg_key>key</arg_key>\n<arg_value>value</arg_value>\n</tool_call>
static const common_regex tool_call_start("<tool_call>([^\n<]+)");
static const common_regex arg_key_regex("<arg_key>([^<]+)</arg_key>");
static const common_regex arg_value_regex("<arg_value>([^<]*)</arg_value>");
static const common_regex tool_call_end("</tool_call>");

while (auto res = builder.try_find_regex(tool_call_start)) {
// Move to the start of the tool call and consume it
builder.move_to(res->groups[0].begin);
builder.consume_regex(tool_call_start);

std::string function_name = builder.str(res->groups[1]);
json arguments = json::object();

builder.consume_spaces();

// Parse all arg_key/arg_value pairs
while (auto key_res = builder.try_consume_regex(arg_key_regex)) {
std::string key = builder.str(key_res->groups[1]);
builder.consume_spaces();

if (auto value_res = builder.try_consume_regex(arg_value_regex)) {
std::string value = builder.str(value_res->groups[1]);
arguments[key] = value;
builder.consume_spaces();
} else {
throw common_chat_msg_partial_exception("Expected <arg_value> after <arg_key>");
}
}

// Consume closing tag
builder.consume_regex(tool_call_end);
builder.consume_spaces();

// Add the parsed tool call
if (!builder.add_tool_call(function_name, "", arguments.dump())) {
throw common_chat_msg_partial_exception("Failed to add GLM tool call");
}
}

builder.add_content(builder.consume_rest());
}

static common_chat_params common_chat_params_init_firefunction_v2(const common_chat_template & tmpl, const struct templates_params & inputs) {
LOG_DBG("%s\n", __func__);
common_chat_params data;
Expand Down Expand Up @@ -1805,6 +1866,11 @@ static common_chat_params common_chat_templates_apply_jinja(
return common_chat_params_init_command_r7b(tmpl, params);
}

// GLM 4.5: detect by <arg_key> and <arg_value> tags (check before Hermes since both use <tool_call>)
if (src.find("<arg_key>") != std::string::npos && src.find("<arg_value>") != std::string::npos && params.json_schema.is_null()) {
return common_chat_params_init_glm_4_5(tmpl, params);
}

// Hermes 2/3 Pro, Qwen 2.5 Instruct (w/ tools)
if (src.find("<tool_call>") != std::string::npos && params.json_schema.is_null()) {
return common_chat_params_init_hermes_2_pro(tmpl, params);
Expand Down Expand Up @@ -1969,6 +2035,9 @@ static void common_chat_parse(common_chat_msg_parser & builder) {
case COMMON_CHAT_FORMAT_GPT_OSS:
common_chat_parse_gpt_oss(builder);
break;
case COMMON_CHAT_FORMAT_GLM_4_5:
common_chat_parse_glm_4_5(builder);
break;
default:
throw std::runtime_error(std::string("Unsupported format: ") + common_chat_format_name(builder.syntax().format));
}
Expand Down
1 change: 1 addition & 0 deletions common/chat.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ enum common_chat_format {
COMMON_CHAT_FORMAT_HERMES_2_PRO,
COMMON_CHAT_FORMAT_COMMAND_R7B,
COMMON_CHAT_FORMAT_GPT_OSS,
COMMON_CHAT_FORMAT_GLM_4_5,

COMMON_CHAT_FORMAT_COUNT, // Not a format, just the # formats
};
Expand Down