Skip to content

Conversation

hksdpc255
Copy link

@hksdpc255 hksdpc255 commented Sep 9, 2025

This PR introduces an enhanced implementation of tool calling for GLM-4.5, building upon the existing contributions by @dhandhalyabhavik and @susmitds (see PR #15186).

Key improvements include:

  1. Grammar-constrained tool-call outputs
    The model’s tool-call messages are now rigorously enforced by a defined grammar, ensuring that generated calls are always well-formed and reliably parsed.

  2. Streaming support for tool-call parsing
    I have added streaming capabilities to the parser to handle tool-call messages as they’re generated. This enhancement enables more responsive and real-time interactions during inference.

Use this Jinja template while testing:

[gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson }}
{% endfor %}
</tools>

For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}
{%- macro mask_user_text(content) -%}
{%- set str_user_text = content|string -%}
{%- set str_user_text = str_user_text.replace('<think>', '<|start_of_thought|>') -%}
{%- set str_user_text = str_user_text.replace('</think>', '<|end_of_thought|>') -%}
{%- set str_user_text = str_user_text.replace('<tool_call>', '<|tool_calls_begin|>') -%}
{%- set str_user_text = str_user_text.replace('</tool_call>', '<|tool_call_end|>') -%}
{%- set str_user_text = str_user_text.replace('<arg_key>', '<|tool_call_argument_begin|>') -%}
{%- set str_user_text = str_user_text.replace('</arg_key>', '<|tool_call_argument_end|>') -%}
{%- set str_user_text = str_user_text.replace('<arg_value>', '<|tool_call_value_begin|>') -%}
{%- set str_user_text = str_user_text.replace('</arg_value>', '<|tool_call_value_end|>') -%}
{{- str_user_text -}}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1, last_user_has_nothink=false) %}
{%- for m in messages %}
    {%- if m.role == 'user' %}
        {%- set user_content = visible_text(m.content) -%}
		{%- set ns.last_user_index = loop.index0 -%}
		{%- set _clean = user_content | trim -%}
		{%- set ns.last_user_has_nothink = _clean.endswith('/nothink') -%}
    {%- endif %}
{%- endfor %}
{%- if ns.last_user_has_nothink -%}
    {%- set enable_thinking = false -%}
{%- endif -%}
{# ===========================
   JSON helpers
   =========================== #}
{% macro unescape_json_string(u_str) -%}
{%- set une = namespace(buf='', s='', slen=0, consume=0, esc='') -%}
{%- set une.s = u_str | trim -%}
{%- if une.s.startswith('"') and une.s.endswith('"') and une.s|length >= 2 -%}
{%- set une.slen = une.s|length - 1 -%}
{# hex map for manual hex -> int conversion #}
{%- set hexmap = {'0':0,'1':1,'2':2,'3':3,'4':4,'5':5,'6':6,'7':7,'8':8,'9':9,'a':10,'b':11,'c':12,'d':13,'e':14,'f':15,'A':10,'B':11,'C':12,'D':13,'E':14,'F':15} -%}
{%- for ich in range(1, une.slen) -%}
{%- if ich >= une.consume -%}
    {%- set uch = une.s[ich:ich+1] -%}
    {%- if uch != '\\' -%}
        {%- set une.buf = une.buf + uch -%}
    {%- else -%}
        {# found backslash, look ahead #}
        {%- set jch = ich + 1 -%}
        {%- if jch >= une.slen -%}
          {%- set une.buf = une.buf + '\ufffd' -%} {# lonely backslash -> replacement #}
        {%- else -%}
          {%- set une.esc = une.s[jch:jch+1] -%}
          {%- if une.esc in ['"', '\\', '/', 'b', 'f', 'n', 'r', 't'] -%}
            {%- if une.esc == '"' -%}{%- set outch = '"' -%}{%- elif une.esc == '\\' -%}{%- set outch = '\\' -%}
            {%- elif une.esc == '/' -%}{%- set outch = '/' -%}{%- elif une.esc == 'b' -%}{%- set outch = '\b' -%}
            {%- elif une.esc == 'f' -%}{%- set outch = '\f' -%}{%- elif une.esc == 'n' -%}{%- set outch = '\n' -%}
            {%- elif une.esc == 'r' -%}{%- set outch = '\r' -%}{%- elif une.esc == 't' -%}{%- set outch = '\t' -%}
            {%- endif -%}
            {%- set une.buf = une.buf + outch -%}
            {%- set une.consume = jch + 1 -%}  {# next loop ich will skip until >= une.consume #}
          {%- elif une.esc == 'u' -%}
            {# attempt to read up to 4 hex digits starting at jch+1 #}
            {%- set kch = jch + 1 -%}
            {%- set hexpart = '' -%}
            {%- if kch < une.slen and hexpart|length < 4 and (une.s[kch:kch+1] in hexmap) -%}
              {%- set hexpart = hexpart + une.s[kch:kch+1] -%}
              {%- set kch = kch + 1 -%}
            {%- endif -%}
            {%- if kch < une.slen and hexpart|length < 4 and (une.s[kch:kch+1] in hexmap) -%}
              {%- set hexpart = hexpart + une.s[kch:kch+1] -%}
              {%- set kch = kch + 1 -%}
            {%- endif -%}
            {%- if kch < une.slen and hexpart|length < 4 and (une.s[kch:kch+1] in hexmap) -%}
              {%- set hexpart = hexpart + une.s[kch:kch+1] -%}
              {%- set kch = kch + 1 -%}
            {%- endif -%}
            {%- if kch < une.slen and hexpart|length < 4 and (une.s[kch:kch+1] in hexmap) -%}
              {%- set hexpart = hexpart + une.s[kch:kch+1] -%}
              {%- set kch = kch + 1 -%}
            {%- endif -%}
            {# hex escape not supported: minja do not support '%c'%var or '\\uXXXX'|format or var|chr #}
            {%- if hexpart == '' -%}
              {# no hex digits -> replacement #}
              {%- set une.buf = une.buf + '\ufffd' -%}
              {%- set une.consume = jch + 1 -%}
            {%- else -%}
              {%- set une.consume = kch -%}
              {%- set une.buf = une.buf + '\ufffd' -%}
            {%- endif -%}
          {%- else -%}
            {# unknown escape: be lenient -> drop backslash, keep next char #}
            {%- set une.buf = une.buf + une.esc -%}
            {%- set une.consume = jch + 1 -%}
          {%- endif -%}
        {%- endif -%}
    {%- endif -%}
{%- endif -%}
{%- endfor -%}
{{ une.buf }}
{%- else -%}
{{ u_str }}
{%- endif -%}
{%- endmacro %}
{% macro emit_json_kv_from_object(json_str) -%}
{%- set ss = json_str | trim -%}
{%- set sslen = ss | length -%}
{%- set inner = ss[1:sslen-1] -%}
{# split top-level members on commas, respecting strings/nesting #}
{%- set ns = namespace(buf='', parts='', in_str=false, esc=false, depth_obj=0, depth_arr=0) -%}
{%- for ch in inner -%}
    {%- if ns.in_str -%}
        {%- if ns.esc -%}
            {%- set ns.esc = false -%}
        {%- elif ch == '\\' -%}
            {%- set ns.esc = true -%}
        {%- elif ch == '"' -%}
            {%- set ns.in_str = false -%}
        {%- endif -%}
        {%- set ns.buf = ns.buf + ch -%}
    {%- else -%}
        {%- if ch == '"' -%}
            {%- set ns.in_str = true -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '{' -%}
            {%- set ns.depth_obj = ns.depth_obj + 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '}' -%}
            {%- set ns.depth_obj = ns.depth_obj - 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '[' -%}
            {%- set ns.depth_arr = ns.depth_arr + 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == ']' -%}
            {%- set ns.depth_arr = ns.depth_arr - 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == ',' and ns.depth_obj == 0 and ns.depth_arr == 0 -%}
            {%- set ns.parts = ns.parts + ns.buf + '\x1F' -%}
            {%- set ns.buf = '' -%}
        {%- else -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}
{%- set ns.parts = ns.parts + ns.buf -%}
{# split each member on the first top-level colon into key/value #}
{%- for pair in ns.parts.split('\x1F') if pair | trim -%}
    {%- set p = pair | trim -%}
    {%- set st = namespace(buf='', in_str=false, esc=false, depth_obj=0, depth_arr=0, seen_colon=false, k='', v='') -%}
    {%- for ch in p -%}
        {%- if st.in_str -%}
            {%- if st.esc -%}
                {%- set st.esc = false -%}
            {%- elif ch == '\\' -%}
                {%- set st.esc = true -%}
            {%- elif ch == '"' -%}
                {%- set st.in_str = false -%}
            {%- endif -%}
            {%- set st.buf = st.buf + ch -%}
        {%- else -%}
            {%- if ch == '"' -%}
                {%- set st.in_str = true -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == '{' -%}
                {%- set st.depth_obj = st.depth_obj + 1 -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == '}' -%}
                {%- set st.depth_obj = st.depth_obj - 1 -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == '[' -%}
                {%- set st.depth_arr = st.depth_arr + 1 -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == ']' -%}
                {%- set st.depth_arr = st.depth_arr - 1 -%}
                {%- set st.buf = st.buf + ch -%}
            {%- elif ch == ':' and not st.seen_colon and st.depth_obj == 0 and st.depth_arr == 0 -%}
                {%- set st.k = st.buf | trim -%}
                {%- set st.buf = '' -%}
                {%- set st.seen_colon = true -%}
            {%- else -%}
                {%- set st.buf = st.buf + ch -%}
            {%- endif -%}
        {%- endif -%}
    {%- endfor -%}
    {%- set st.v = st.buf | trim -%}
    {# dequote key if it's a JSON string #}
    {%- set key = st.k | trim -%}
<arg_key>{{ unescape_json_string(key) }}</arg_key>
{% set val_str = st.v | trim | safe %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length > 1 else unescape_json_string(val_str) }}</arg_value>{{- '\n' -}}
{%- endfor -%}
{%- endmacro %}
{% macro emit_json_items_from_array(_json_str) -%}
{%- set json_str = _json_str | trim -%}
{%- set json_str_len = json_str | length -%}
{%- if "[" in json_str[1:json_str_len] -%}
{%- set inner = json_str[1:json_str_len-1] -%}
{%- set ns = namespace(buf='', parts='', in_str=false, esc=false, depth_obj=0, depth_arr=0) -%}
{%- for ch in inner -%}
    {%- if ns.in_str -%}
        {%- if ns.esc -%}
            {%- set ns.esc = false -%}
        {%- elif ch == '\\' -%}
            {%- set ns.esc = true -%}
        {%- elif ch == '"' -%}
            {%- set ns.in_str = false -%}
        {%- endif -%}
        {%- set ns.buf = ns.buf + ch -%}
    {%- else -%}
        {%- if ch == '"' -%}
            {%- set ns.in_str = true -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '{' -%}
            {%- set ns.depth_obj = ns.depth_obj + 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '}' -%}
            {%- set ns.depth_obj = ns.depth_obj - 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == '[' -%}
            {%- set ns.depth_arr = ns.depth_arr + 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == ']' -%}
            {%- set ns.depth_arr = ns.depth_arr - 1 -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- elif ch == ',' and ns.depth_obj == 0 and ns.depth_arr == 0 -%}
            {%- set ns.parts = ns.parts + ns.buf + '\x1F' -%}
            {%- set ns.buf = '' -%}
        {%- else -%}
            {%- set ns.buf = ns.buf + ch -%}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}
{%- set ns.parts = ns.parts + ns.buf -%}

{%- set idx = 0 -%}
{%- for item in ns.parts.split('\x1F') if item | trim -%}
<arg_key>{{ idx }}</arg_key>
{% set val_str = item | trim | safe %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length > 1 else unescape_json_string(val_str) }}</arg_value>
{%- set idx = idx + 1 -%}
{%- endfor -%}
{%- else -%}
{{ emit_json_kv_from_object(json_str[1:json_str_len-1]) }}
{%- endif -%}
{%- endmacro %}
{% macro emit_json_from_string(s) -%}
    {%- set t = s | trim -%}
    {%- if t.startswith('{') and t.endswith('}') -%}
        {{ emit_json_kv_from_object(t) }}
    {%- elif t.startswith('[') and t.endswith(']') -%}
        {{ emit_json_items_from_array(t) }}
    {%- else -%}
        {{ s }}
    {%- endif -%}
{%- endmacro %}
{% macro emit_args(args) -%}
{%- if args is string -%}
{{ emit_json_from_string(args) }}
{%- elif args is mapping -%}
{%- for k, v in args | items -%}
<arg_key>{{ k }}</arg_key>
{%- if v is mapping or (v is iterable and v is not string) -%}
{% set val_str = v | tojson %}
<arg_value>{{ val_str }}</arg_value>
{%- else -%}
{% set val_str = v %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length < 2 else val_str|string|tojson }}</arg_value>
{%- endif -%}
{%- if v is string -%}
{% set val_str = v %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length < 2 else val_str|string|tojson }}</arg_value>
{%- else -%}
{% set val_str = v | tojson %}
<arg_value>{{ val_str }}</arg_value>
{%- endif -%}

{%- endfor -%}
{%- elif args is iterable and args is not string -%}
{# native list case (some runtimes pass lists, not JSON strings) #}
{%- for v in args -%}
<arg_key>{{ loop.index0 }}</arg_key>
{% set val_str = v | tojson if v is mapping or (v is iterable and v is not string) else v %}
<arg_value>{{ val_str if val_str.split("</arg_value>")|length < 2 else val_str|string|tojson }}</arg_value>
{%- endfor -%}
{%- else -%}
{{ args | tojson }}
{%- endif -%}
{%- endmacro %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>
{%- set user_content = visible_text(m.content) -%}
{{ mask_user_text(user_content) }}
{%- set _uc = user_content | trim -%}
{%- set last_user_index_safe = ns.last_user_index | default(-1, true) -%}
{%- if loop.index0 == last_user_index_safe
      and (enable_thinking is defined and not enable_thinking)
      and not _uc.endswith('/nothink') -%}
{{ '\n/nothink' }}
{%- endif -%}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string and 0 != m.reasoning_content.strip()|length %}
    {%- set reasoning_content = m.reasoning_content %}
{%- else %}
    {%- if '</think>' in content %}
        {%- set think_parts = content.split('</think>') %}
        {%- if content.endswith('</think>') %}
            {%- set before_end_think = think_parts %}
            {%- set after_end_think = '' %}
        {%- else %}
            {%- set think_parts_len = think_parts|length %}
            {%- set before_end_think = think_parts[:think_parts_len - 1] %}
            {%- set after_end_think = think_parts[think_parts_len - 1] %}
        {%- endif %}
        {% set nsreasoning = namespace(content='') %}
        {%- for before_end_think_part in before_end_think %}
            {%- set think_start_parts = before_end_think_part.split('<think>') %}
            {%- set think_start_parts_len = think_start_parts|length %}
            {%- set nsreasoning.content = nsreasoning.content + think_start_parts[think_start_parts_len - 1].lstrip('\n') %}
        {%- endfor %}
        {%- set reasoning_content = nsreasoning.content %}
        {%- set content = after_end_think.lstrip('\n') %}
    {%- endif %}
{%- endif %}
{%- set last_user_index_safe = ns.last_user_index | default(-1, true) -%}
{%- if last_user_index_safe >= 0 and loop.index0 > last_user_index_safe and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}
{% if m.tool_calls %}
{% for tc in m.tool_calls %}
{%- set f = tc.function if tc.function else tc -%}
{{ '\n<tool_call>' + f.name }}
{{ emit_args(f.arguments) }}
{{- '</tool_call>' }}
{% endfor %}
{% endif %}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
    {{- '<|observation|>' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{%- if m.content != '' -%}
{{-  emit_args(m.content) }}
{%- endif %}
{{- '</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}
{{- '<tool_response>' }}
{{ tr.output if tr.output is defined else tr }}
{{- '</tool_response>' }}{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    <|assistant|>{{- '\n<think></think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%}

Although not yet implemented, I‘m planning the following improvements:

  1. Patch jinja template in common_chat_params_init_glm_4_5 to make it compatible with the original Unsloth GGUF chat template, and potentially even with the official chat template.

  2. Add dedicated unit tests for grammar enforcement and streaming parsing.

Testing and feedback are welcome.

Suggested commit message after squash commits:

common: add GLM-4.5 tool calling support

- Add COMMON_CHAT_FORMAT_GLM_4_5 format enum
- Add template detection based on <arg_key> and <arg_value> tags
- Fix null content handling in message parsing and serialization
- Ensure GLM-4.5 detection runs before Hermes to avoid misidentification
- Implement GLM-4.5 tool call parser

Co-authored-by: Bhavik Dhandhalya <[email protected]>
Co-authored-by: Susmit Das <[email protected]>

@hksdpc255
Copy link
Author

Use --reasoning-format none if your OpenAI-compatible client does not support sending reasoning_content back to the server.

@sbrnaderi
Copy link

Got a runtime error:

<arg_value># Do this and that: <10
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Partial parse: Expected </arg_value> after <arg_value>

Looks like it is happening because of the "<10" characters in the generated text during a function call parsing. Probably it is trying to parse <10 as the beginning of an xml tag?

@hksdpc255
Copy link
Author

Got a runtime error:

<arg_value># Do this and that: <10
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Partial parse: Expected </arg_value> after <arg_value>

Looks like it is happening because of the "<10" characters in the generated text during a function call parsing. Probably it is trying to parse <10 as the beginning of an xml tag?

@sbrnaderi From the log you provided, there isn’t anything unexpected. The JSON parse error occurs because I first try to parse arg_value as JSON; if that fails, it is parsed as a raw string. The failure log cannot be suppressed due to the design of llama.cpp.

@sbrnaderi
Copy link

@hksdpc255 so, you are trying to parse the xml format from the GLM model to JSON, but I think what goes wrong here is that the "<10" part of the text is recognised as an xml tag. No?

Got a runtime error:

<arg_value># Do this and that: <10
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Partial parse: Expected </arg_value> after <arg_value>

Looks like it is happening because of the "<10" characters in the generated text during a function call parsing. Probably it is trying to parse <10 as the beginning of an xml tag?

@sbrnaderi From the log you provided, there isn’t anything unexpected. The JSON parse error occurs because I first try to parse arg_value as JSON; if that fails, it is parsed as a raw string. The failure log cannot be suppressed due to the design of llama.cpp.

@hksdpc255
Copy link
Author

@sbrnaderi Would you be able to share more logs or your prompt? The current log you shared doesn’t seem to have any problem, and additional details would help me figure out what’s going wrong.

@hksdpc255
Copy link
Author

@sbrnaderi I guess your issue is fixed by latest commit.

@sbrnaderi
Copy link

@hksdpc255 thanks, I will try your new commit.

@ai-christianson
Copy link

I'm running this PR with the supplied chat template and it is working 👍

@MikeLP
Copy link

MikeLP commented Sep 15, 2025

Also checked this PR and everything works perfect with provided jinja template

@DKingAlpha
Copy link

DKingAlpha commented Sep 30, 2025

Parsing json in <arg_value> is pretty much broken on current branch. Original patch will crash while streaming response ends with <arg_value> without tailing content.

This patch will fix the crash

--- a/common/json-partial.cpp	2025-10-01 03:17:14.681184368 +0800
+++ b/common/json-partial.cpp	2025-10-01 03:15:35.623175731 +0800
@@ -183,7 +183,7 @@
                 } else if (can_parse(str + "\"" + closing)) {
                     // Was inside an object value string
                     str += (out.healing_marker.json_dump_marker = magic_seed) + "\"" + closing;
-                } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"" + closing)) {
+                } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"" + closing)) {
                     // Was inside an object value string after an escape
                     str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"" + closing;
                 } else {
@@ -202,7 +202,7 @@
                 } else if (can_parse(str + "\"" + closing)) {
                     // Was inside an array value string
                     str += (out.healing_marker.json_dump_marker = magic_seed) + "\"" + closing;
-                } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"" + closing)) {
+                } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"" + closing)) {
                     // Was inside an array value string after an escape
                     str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"" + closing;
                 } else if (!was_maybe_number() && can_parse(str + ", 1" + closing)) {
@@ -227,7 +227,7 @@
                 } else if (can_parse(str + "\": 1" + closing)) {
                     // Was inside an object key string
                     str += (out.healing_marker.json_dump_marker = magic_seed) + "\": 1" + closing;
-                } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\": 1" + closing)) {
+                } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\": 1" + closing)) {
                     // Was inside an object key string after an escape
                     str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\": 1" + closing;
                 } else {
@@ -253,7 +253,7 @@
             if (can_parse(str + "\"")) {
                 // Was inside an string
                 str += (out.healing_marker.json_dump_marker = magic_seed) + "\"";
-            } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
+            } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
                 // Was inside an string after an escape
                 str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"";
             } else {

Besides, thanks for your PR. It's special because it works with complex json schema, with a quick hack like this:

--- a/common/json-schema-to-grammar.cpp	2025-10-01 00:22:00.744098340 +0800
+++ b/common/json-schema-to-grammar.cpp	2025-10-01 00:19:48.692716944 +0800
@@ -944,6 +944,9 @@
             return _add_rule(rule_name, out.str());
         } else if (schema.empty() || schema_type == "object") {
             return _add_rule(rule_name, _add_primitive("object", PRIMITIVE_RULES.at("object")));
+        } else if (schema_type.is_null() && schema.contains("not") && schema["not"].is_object() && schema["not"].empty()) {
+            // librechat returns not:{}, which does nothing.
+            return "";
         } else {
             if (!schema_type.is_string() || PRIMITIVE_RULES.find(schema_type.get<std::string>()) == PRIMITIVE_RULES.end()) {
                 _errors.push_back("Unrecognized schema: " + schema.dump());

LibreChat passed scrambled schema including {"not":{}}, this patch will ignore that.

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 1, 2025

@DKingAlpha Thanks for pointing that out! It seems my compiler adds some extra padding to the string object, which ends up masking the string array underflow crash.

@hksdpc255
Copy link
Author

@DKingAlpha I took a deeper look at your patch and had a question. It seems to modify some sections I hadn’t touched in the original code common/json-partial.cpp. Did the original code without my PR crash for you?

@DKingAlpha
Copy link

DKingAlpha commented Oct 1, 2025

@DKingAlpha I took a deeper look at your patch and had a question. It seems to modify some sections I hadn’t touched in the original code common/json-partial.cpp. Did the original code without my PR crash for you?

No

I am using clang-20, if that helps to reproduce.

Either this function(try_consume_json) is designed to run on non-empty string, which means you need to change your code, or its a bug in that part and never triggered before. I prefer the latter one.

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 1, 2025

@DKingAlpha I took a deeper look at your patch and had a question. It seems to modify some sections I hadn’t touched in the original code common/json-partial.cpp. Did the original code without my PR crash for you?

No

I am using clang-20, if that helps to reproduce.

Either this function(try_consume_json) is designed to run on non-empty string, which means you need to change your code, or its a bug in that part and never triggered before. I prefer the latter one.

@DKingAlpha Would it still crash if you only patched the sections that my PR actually changed?

@@ -253,7 +253,7 @@
             if (can_parse(str + "\"")) {
                 // Was inside an string
                 str += (out.healing_marker.json_dump_marker = magic_seed) + "\"";
-            } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
+            } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
                 // Was inside an string after an escape
                 str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"";
             } else {

@DKingAlpha
Copy link

@DKingAlpha I took a deeper look at your patch and had a question. It seems to modify some sections I hadn’t touched in the original code common/json-partial.cpp. Did the original code without my PR crash for you?

No

I am using clang-20, if that helps to reproduce.

Either this function(try_consume_json) is designed to run on non-empty string, which means you need to change your code, or its a bug in that part and never triggered before. I prefer the latter one.

@DKingAlpha Would it still crash if you only patched the sections that my PR actually changed?

@@ -253,7 +253,7 @@
             if (can_parse(str + "\"")) {
                 // Was inside an string
                 str += (out.healing_marker.json_dump_marker = magic_seed) + "\"";
-            } else if (str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
+            } else if (!str.empty() && str[str.length() - 1] == '\\' && can_parse(str + "\\\"")) {
                 // Was inside an string after an escape
                 str += (out.healing_marker.json_dump_marker = "\\" + magic_seed) + "\"";
             } else {

Line 253 is exactly the location that crashed on my side. But I patched all other .length() - 1 together. They all look fishy. So I really cant say.

I mean even without running into it, only by static manual reviewing, it shall be checked before access

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 1, 2025

@DKingAlpha I believe

if (!healing_marker.empty() && !err_loc.stack.empty()) {
ensures that str is not empty. I’m considering changing my code from

if (!healing_marker.empty() && err_loc.stack.empty())

to

if (err_loc.position != 0 && !healing_marker.empty() && err_loc.stack.empty())

. What do you think about this change?

@DKingAlpha
Copy link

Sorry for the delay, was on vacation.

int main(int argc, char ** argv) {
    std::string str = "";
    auto it = str.begin();
    auto end = str.end();


    json_error_locator err_loc;
    auto start = it;
    json::sax_parse(it, end, &err_loc);

    printf("found error: %d\n", err_loc.found_error);    // 1
    printf("position: %zu\n", err_loc.position);              // 0
    printf("stack size: %zu\n", err_loc.stack.size());        // 0

    return 0;
}

if (err_loc.position != 0 && !healing_marker.empty() && err_loc.stack.empty()) should filter empty string,
but filter here will affect a larger block, and I have no idea what that block does.

@DKingAlpha
Copy link

You know what, I guess we need to @ochafik

Please take a look if that's possible.

bart2 pushed a commit to bart2/ik_llama.cpp that referenced this pull request Oct 4, 2025
@bart2
Copy link

bart2 commented Oct 5, 2025

@hksdpc255, thanks for working on this.

How can I test if tool calling is working correctly? I tried using opencode "/init" command with GLM 4.5 running with llama.cpp build including this PR and all I get is this:

First, let me check what files are in the root directory and look for any existing AGENTS.md file. I'll analyze this codebase to create an AGENTS.md file for agentic coding agents. Let me start by exploring the repository
     structure and understanding the project setup. <tool_call>list <arg_key>path</arg_key> <arg_value>... redacted ...</arg_value> </tool_call>

opencode attempts to issue a tool call using XML syntax, but the model does not respond to it.

Here are my llama.cpp arguments:

llama.cpp/build/bin/llama-server \
  --jinja \
  --chat-template-file ~/glm-4.5-tools-template.jinja \
  --model ~/GLM-4.5-UD-IQ2_M/GLM-4.5-UD-IQ2_M-00001-of-00003.gguf \
  --ctx-size 60000 \
  --temp 0.6 \
  --top_p 0.95 \
  --min_p 0.01 \
  -ot exps=CPU \
  --n-gpu-layers 40 \
  --parallel 1 \
  --threads 56 \
  --host 0.0.0.0 \
  --port 8080 \
  --threads-batch 112

Here are llama.cpp logs from that opencode attempt:

Grammar trigger pattern full: `(?:<think>[\s\S]*?</think>\s*)?\s*((?:<tool_call>|<function|(?:```(?:json|xml)?
\s*)?(?:<function_call>|<tools>|<xml><json>|<response>)?\s*\{\s*"name"\s*:\s*"(?:bash|edit|webfetch|glob|grep|list|read|write|todowrite|todoread|task)"))[\s\S]*`
INFO [   launch_slot_with_task] slot is processing task | tid="129685772312576" timestamp=1759649520 id_slot=0 id_task=2
INFO [            update_slots] kv cache rm [p0, end) | tid="129685772312576" timestamp=1759649520 id_slot=0 id_task=2 p0=4
INFO [            update_slots] kv cache rm [p0, end) | tid="129685772312576" timestamp=1759649555 id_slot=0 id_task=2 p0=2052
INFO [            update_slots] kv cache rm [p0, end) | tid="129685772312576" timestamp=1759649592 id_slot=0 id_task=2 p0=4100
INFO [            update_slots] kv cache rm [p0, end) | tid="129685772312576" timestamp=1759649629 id_slot=0 id_task=2 p0=6148
INFO [           print_timings] prompt eval time     =  144964.09 ms /  7966 tokens (   18.20 ms per token,    54.95 tokens per second) | tid="129685772312576" timestamp=1759649690 id_slot=0 id_task=2 t_prompt_processing=144964.09 n_prompt_tokens_processed=7966 t_token=18.197852121516444 n_tokens_second=54.95154006761261
INFO [           print_timings] generation eval time =   24989.64 ms /   114 runs   (  219.21 ms per token,     4.56 tokens per second) | tid="129685772312576" timestamp=1759649690 id_slot=0 id_task=2 t_token_generation=24989.643 n_decoded=114 t_token=219.2073947368421 n_tokens_second=4.561889899747667
INFO [           print_timings]           total time =  169953.73 ms | tid="129685772312576" timestamp=1759649690 id_slot=0 id_task=2 t_prompt_processing=144964.09 t_token_generation=24989.643 t_total=169953.733
INFO [            update_slots] slot released | tid="129685772312576" timestamp=1759649690 id_slot=0 id_task=2 n_ctx=60000 n_past=8083 n_system_tokens=0 n_cache_tokens=8083 truncated=false
INFO [            update_slots] all slots are idle | tid="129685772312576" timestamp=1759649690
INFO [      log_server_request] request | tid="129582629756928" timestamp=1759649690 remote_addr="192.168.1.156" remote_port=54235 status=200 method="POST" path="/v1/chat/completions" params={}
INFO [            update_slots] all slots are idle | tid="129685772312576" timestamp=1759649690

~/glm-4.5-tools-template.jinja contains the Jinja template shared in the PR description.

Any ideas what might be missing here?

UPDATE: opencode tool calling with GLM 4.5 worked when I used the chat template supplied by the model provider at https://huggingface.co/zai-org/GLM-4.5/blob/main/chat_template.jinja

@hksdpc255
Copy link
Author

The chat template should only affect the request phase — llama.cpp uses the template to format your input prompt.

If there’s an issue with the Jinja template, my guess is that there might be a missing \n in my template, and GLM might have learned not to generate a newline before <tool_call>.

In my patch, the tool parser only recognizes <tool_call> when it’s preceded by a \n.

@aaronnewsome
Copy link

This PR and the chat template in the first message were the only solution I've been able to find that allowed Opencode/Crush to work with GLM-4.5-Air and llama.cp (although diff edits failed at an astonishing rate, other calls seem to work well enough). Trust me, I've tried everything and none of the chat templates I found worked as well as this one for both Crush and OpenCode. I had applied the patch to the llama.cpp source code back when I found the PR and it built fine and things seemed ok. Now that I've switched to GLM 4.6, it requires a newer llama.cpp, but this patch will no longer build with the server. Cline still works fine with unsloth's builtin chat template but Opencode can't do any tool calls at all.

Any recommendations on how to get the patch working in the latest llama.cpp release or any advice on how to get Opencode working with GLM 4.6 tools calls?

@hksdpc255
Copy link
Author

@aaronnewsome This PR should already merged the GLM 4.6 support patch from bartowski1182. Any further info for your GLM 4.6 tool call issue?

@aaronnewsome
Copy link

uname:

Linux hawk 6.8.0-85-generic #85-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 18 15:26:59 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi:

Thu Oct  9 16:01:52 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.06              Driver Version: 580.65.06      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX PRO 6000 Blac...    Off |   00000000:02:00.0 Off |                  Off |
| 30%   33C    P8              5W /  300W |   76337MiB /  97887MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX PRO 6000 Blac...    Off |   00000000:04:00.0 Off |                  Off |
| 30%   32C    P8             15W /  300W |   75837MiB /  97887MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A          136857      C   llama-server                          76322MiB |
|    1   N/A  N/A          136857      C   llama-server                          75818MiB |
+-----------------------------------------------------------------------------------------+

docker run command:

docker run -d --restart unless-stopped --runtime nvidia --gpus all -p 8080:8080 --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -e NCCL_P2P_DISABLE=1 -e NCCL_IB_DISABLE=1 -e NCCL_DEBUG=INFO -v /home/anewsome/.ollama:/root/.ollama --name llama-cpp --hostname llama-cpp-hawk -e VERSION=b6715 -e MODEL_REPO= -e LLAMA_SET_ROWS=0 registry.swift.local/llama-cpp:b6715 /root/.ollama/models/GLM-4.6-UD-IQ2_XXS/start-llama

start-llama - llama-server command (in container):

llama-server \
  --model /root/.ollama/models/GLM-4.6-UD-IQ2_XXS/GLM-4.6-UD-IQ2_XXS-00001-of-00003.gguf \
  --alias GLM-4.6-UD-IQ2_XXS \
  --threads -1 \
  --ctx-size 202752 \
  --n-gpu-layers 99 \
  --temp 1.0 \
  --min-p 0.0 \
  --top-p 0.95 \
  --top-k 40 \
  --repeat-penalty 1.05 \
  --context-shift \
  --host 0.0.0.0 \
  --reasoning-format auto \
  --flash-attn on \
  --cache-type-k q8_0 \
  --cache-type-v q8_0 \
  --jinja \

opencode:

write a git ignore file with common patterns for a python flask project
anewsome (11:18 AM)

I'll create a .gitignore file with common patterns for a Python Flask project. <tool_call>write <arg_key>filePath</arg_key>
<arg_value>.gitignore</arg_value> <arg_key>content</arg_key> <arg_value># Byte-compiled / optimized / DLL files pycache/ *.py[cod] *$py.class
... more stuff for the git ignore file ...

Notice the tool call xml appear in the model response inline in the chat, and it continues all the way to the end of the file it's supposed to be creating, closes with another </tool_call> but never actually writes anything.

I've also tried the chat template at the top of this thread (which I did have some success with 4.5-air, after applying the patch) and the chat template from the z.ai huggingface page for GLM 4.6, neither allow tool calls to work with opencode or crush.

opencode config:

 "provider": {
    "hawk": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://hawk.swift.local:8080/v1",
        "includeUsage": true,
        "timeout": 1000
      },
      "models": {
        "GLM-4.6-UD-IQ2_XXS": {
          "name": "glm-4.6",
          "tool_call": true,
          "reasoning": true,
          "cost": {
            "input": 0.10,
            "output": 1.20
          },
          "options": {
            "num_ctx": 120000,
            "temperature": 1.0,
            "top_p": 0.95,
            "top_k": 40
          }
        }
      }
    }
 }

@hksdpc255
Copy link
Author

@aaronnewsome Is it work for you now?

@aaronnewsome
Copy link

I rebuilt the container using hksdpc255:master and the tool calls are not appearing inline in the chat anymore. Opencode/Crush diff edits are still horrible but usually the model figures out another way to edit the file like using bash tool edits or sed commands. Now I have GLM 4.6 working as well GLM 4.5 was. However, with diff edits failing 100% of the time, it's still useful for planning or analyzing a codebase or whatever. It even works pretty well when writing new files. The inability to edit files efficiently is a real problem though.

Off topic, I tried claude code router (ccr) with claude code, and edits were pretty reliable and actually, most tool calls were pretty reliable, but holy wow is it slow.

Now my main issue is that 4.6 IQ2_XXS is too big to run on the dual pro 6000 without flash attention enabled. enabling flash attention causes the entire system to reset when the gpus are cooking up responses. So I'll be going back to GLM 4.5 where the Q4 fits with 128K context, and flash attention off.

I appreciate your work on creating this branch with much better tool calling than the main branch. As it works so much better for tool calling (at least for me), I wonder why these fixes haven't made it into the main project??

@hksdpc255
Copy link
Author

@aaronnewsome

Opencode/Crush diff edits are still horrible but usually the model figures out another way to edit the file like using bash tool edits or sed commands.

The open-source GLM model may not be well-tuned for the diff format used by Opencode. You could try the official Opencode API provided by z.ai. It might yield better results than a locally quantized deployment.

If the official API performs well, there might also be something wrong with my patch. You can paste the failing diff edits from your log — that would help me figure out what’s going on.

I wonder why these fixes haven't made it into the main project??

I think this should be reviewed by a maintainer before being merged into master. Unfortunately, it seems that no maintainer has had time to review it yet.

@aaronnewsome
Copy link

I don't have any detailed logs from the llama.cpp crash that opencode causes, but here's the tail of the container log on the latest crash today. this is with unsloths Q4 GLM-4.5-Air and the hksdpc255:master llama.cpp:

srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 172.20.3.42 200
srv  params_from_: Chat format: GLM 4.5
slot get_availabl: id  0 | task 21156 | selected slot by lcs similarity, lcs_len = 24546, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id  0 | task 21405 | processing task
slot update_slots: id  0 | task 21405 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 24575
slot update_slots: id  0 | task 21405 | n_past = 24546, memory_seq_rm [24546, end)
slot update_slots: id  0 | task 21405 | prompt processing progress, n_past = 24575, n_tokens = 29, progress = 0.001180
slot update_slots: id  0 | task 21405 | prompt done, n_past = 24575, n_tokens = 29
slot      release: id  0 | task 21405 | stop processing: n_past = 24643, truncated = 0
slot print_timing: id  0 | task 21405 | 
prompt eval time =     200.19 ms /    29 tokens (    6.90 ms per token,   144.86 tokens per second)
       eval time =    2013.96 ms /    69 tokens (   29.19 ms per token,    34.26 tokens per second)
      total time =    2214.15 ms /    98 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 172.20.3.42 200
srv  params_from_: Chat format: GLM 4.5
slot get_availabl: id  0 | task 21405 | selected slot by lcs similarity, lcs_len = 24643, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id  0 | task 21475 | processing task
slot update_slots: id  0 | task 21475 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 25164
slot update_slots: id  0 | task 21475 | n_past = 24643, memory_seq_rm [24643, end)
slot update_slots: id  0 | task 21475 | prompt processing progress, n_past = 25164, n_tokens = 521, progress = 0.020704
slot update_slots: id  0 | task 21475 | prompt done, n_past = 25164, n_tokens = 521
slot      release: id  0 | task 21475 | stop processing: n_past = 25220, truncated = 0
slot print_timing: id  0 | task 21475 | 
prompt eval time =    1958.69 ms /   521 tokens (    3.76 ms per token,   265.99 tokens per second)
       eval time =    1674.96 ms /    57 tokens (   29.39 ms per token,    34.03 tokens per second)
      total time =    3633.65 ms /   578 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /v1/chat/completions 172.20.3.42 200
srv  params_from_: Chat format: GLM 4.5
slot get_availabl: id  0 | task 21475 | selected slot by lcs similarity, lcs_len = 25220, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id  0 | task 21533 | processing task
slot update_slots: id  0 | task 21533 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 25510
slot update_slots: id  0 | task 21533 | n_past = 25220, memory_seq_rm [25220, end)
slot update_slots: id  0 | task 21533 | prompt processing progress, n_past = 25510, n_tokens = 290, progress = 0.011368
slot update_slots: id  0 | task 21533 | prompt done, n_past = 25510, n_tokens = 290
slot      release: id  0 | task 21533 | stop processing: n_past = 25981, truncated = 0
slot print_timing: id  0 | task 21533 | 
prompt eval time =    1271.89 ms /   290 tokens (    4.39 ms per token,   228.01 tokens per second)
       eval time =   14239.01 ms /   472 tokens (   30.17 ms per token,    33.15 tokens per second)
      total time =   15510.89 ms /   762 tokens
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: now finding less tool calls!
/root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/start-llama: line 25:   123 Aborted                 (core dumped) llama-server --model /root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf --alias GLM-4.5-Air-UD-Q4_K_XL --threads -1 --ctx-size 131072 --n-gpu-layers 49 --temp 1.0 --min-p 0.0 --top-p 0.95 --top-k 40 --repeat-penalty 1.05 --context-shift --host 0.0.0.0 --reasoning-format none --flash-attn off -hf ggml-org/gemma-3-12b-it-GGUF --no-mmproj-offload --jinja --chat-template-file /root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/glm-45.jinja

Not every failed diff edit causes the server to crash, but when the server does crash, this is the most common error I see, Invalid diff: now finding less tool calls!

@hksdpc255
Copy link
Author

hksdpc255 commented Oct 12, 2025

@aaronnewsome You can run llama-server with -lv 1 to enable verbose logging. This will output detailed information to stdout, which might help diagnose the issue.

@matbrez
Copy link

matbrez commented Oct 12, 2025

opencode edits are failing because its edit tool does not return any content and then the template replaces that empty result with SYSTEM ERROR: This tool is not working due to internal error, try another tool or give a direct response.

I replaced

{%- if m.content == '' -%}
{{- 'SYSTEM ERROR: This tool is not working due to internal error, try another tool or give a direct response.' }}
{%- else -%}
{{-  emit_args(m.content) }}
{%- endif %}

with

{%- if m.content != '' -%}
{{-  emit_args(m.content) }}
{%- endif %}

and so far I didn't notice any issues.

@aaronnewsome
Copy link

Here's what I see with -lv 1 before the crash:

que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 213
que    start_loop: update slots
srv  update_slots: posting NEXT_RESPONSE
que          post: new task, id = 214, front = 0
slot update_slots: id  0 | task 137 | slot decode token, n_ctx = 131072, n_past = 61190, n_cache_tokens = 61190, truncated = 0
srv  update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
data stream, to_send: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"}"}}]}}],"created":1760321159,"id":"chatcmpl-MJPawRpSZfxOmAILiSbqfJMTR9rBoGCG","model":"glm-4.5-air","system_fingerprint":"b6735-bbb592d5","object":"chat.completion.chunk"}

srv  update_chat_: Parsing chat message: 
<think></think>
<tool_call>edit
<arg_key>file_path</arg_key>
<arg_value>/home/anewsome/Documents/git/podscribe-rewrite/podscribe-react/src/pages/Dashboard/Dashboard.tsx</arg_value>
<arg_key>old_string</arg_key>
<arg_value><WebSocketStatus /></arg_value>
<arg_key>new_string</arg_key>
<arg_value>{/* <WebSocketStatus /> */}</arg_value>
<arg_key>expected_replacements</arg_key>
<arg_value>1</arg_value>
</tool_call>
Parsing input with format GLM 4.5: 
<think></think>
<tool_call>edit
<arg_key>file_path</arg_key>
<arg_value>/home/anewsome/Documents/git/podscribe-rewrite/podscribe-react/src/pages/Dashboard/Dashboard.tsx</arg_value>
<arg_key>old_string</arg_key>
<arg_value><WebSocketStatus /></arg_value>
<arg_key>new_string</arg_key>
<arg_value>{/* <WebSocketStatus /> */}</arg_value>
<arg_key>expected_replacements</arg_key>
<arg_value>1</arg_value>
</tool_call>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON: <<<>>>
Failed to parse up to error: [json.exception.parse_error.101] parse error at line 1, column 2: syntax error while parsing object key - unexpected end of input; expected string literal: <<<{>>>
srv          send: sending result for task id = 137
srv          send: task id = 137 pushed to result queue
slot process_toke: id  0 | task 137 | stopped by EOS
slot process_toke: id  0 | task 137 | n_decoded = 77, n_remaining = -1, next token: 151338 '<|observation|>'
slot      release: id  0 | task 137 | stop processing: n_past = 61190, truncated = 0
slot print_timing: id  0 | task 137 | 
prompt eval time =     395.94 ms /    46 tokens (    8.61 ms per token,   116.18 tokens per second)
       eval time =    3957.22 ms /    77 tokens (   51.39 ms per token,    19.46 tokens per second)
      total time =    4353.17 ms /   123 tokens
srv  update_chat_: Parsing chat message: 
<think></think>
<tool_call>edit
<arg_key>file_path</arg_key>
<arg_value>/home/anewsome/Documents/git/podscribe-rewrite/podscribe-react/src/pages/Dashboard/Dashboard.tsx</arg_value>
<arg_key>old_string</arg_key>
<arg_value><WebSocketStatus /></arg_value>
<arg_key>new_string</arg_key>
<arg_value>{/* <WebSocketStatus /> */}</arg_value>
<arg_key>expected_replacements</arg_key>
<arg_value>1</arg_value>
</tool_call>
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: now finding less tool calls!
/root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/start-llama: line 26:   121 Aborted                 (core dumped) llama-server --model /root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf --alias GLM-4.5-Air-UD-Q4_K_XL -lv 1 --threads -1 --ctx-size 131072 --n-gpu-layers 49 --temp 1.0 --min-p 0.0 --top-p 0.95 --top-k 40 --repeat-penalty 1.05 --context-shift --host 0.0.0.0 --reasoning-format none --flash-attn off -hf ggml-org/gemma-3-12b-it-GGUF --no-mmproj-offload --jinja --chat-template-file /root/.ollama/models/GLM-4.5-Air-GGUF-UD-Q4_K_XL/glm-45.jinja

@hksdpc255
Copy link
Author

@matbrez Thanks for the suggestion! I’ve applied your changes and they look good.

@hksdpc255
Copy link
Author

@aaronnewsome Could you provide more logs so that they include three occurrences of srv update_chat_: Parsing chat message:?

@ddh0 ddh0 mentioned this pull request Oct 14, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants