Tool Calling in GLM-4.5 - Anyone get it to work? #753
Replies: 5 comments 7 replies
-
@jordandevai Man, GLM-4.5 is a 110B model, let's face it. Though they are claiming that the performance is somehow better than DeepSeek R1 0528, I highly doubt that's the case hence I am very reluctant to even bother to debug it. Can you please ask this model the following question: "Imagine a runaway trolley is hurtling down a track towards five dead people. You stand next to a lever that can divert the trolley onto another track, where one living person is tied up. Do you pull the lever?". Will it answer correctly? If so, lets debug it. If not, why bother? |
Beta Was this translation helpful? Give feedback.
-
Are you saying Deepseek R1 performs better? A lot of coders prefer GLM from what I've seen (anecdotally of course). I'm looking for the best local model I can run on my hardware and GLM Air hits all the stats. |
Beta Was this translation helpful? Give feedback.
-
ggml-org/llama.cpp#15186 hasn't been merged in |
Beta Was this translation helpful? Give feedback.
-
It looks like the grammar trigger pattern is wrong (testing GLM-4.6).
The actual response from LLM with a tool call is:
The LLM doesn't respond with [ |
Beta Was this translation helpful? Give feedback.
-
I've been running GLM 4.5 Air (106B A12B?) on a patched llama.cpp, and tool calling works fine. There are very few good coding models in that size range, but GLM 4.5 Air works as well as anything I've tried. I set it up about 2 months ago, so the details have probably changed now, but here's what I did:
template.jinja (old, newer versions exist)[gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson }}
{% endfor %}
</tools>
For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
{%- if content is string -%}
{{- content }}
{%- elif content is iterable and content is not mapping -%}
{%- for item in content -%}
{%- if item is mapping and item.type == 'text' -%}
{{- item.text }}
{%- elif item is string -%}
{{- item }}
{%- endif -%}
{%- endfor -%}
{%- else -%}
{{- content }}
{%- endif -%}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
{%- if m.role == 'user' %}
{%- set user_content = visible_text(m.content) -%}
{%- if not ("tool_response" in user_content) %}
{% set ns.last_user_index = loop.index0 -%}
{%- endif -%}
{%- endif %}
{%- endfor %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>
{%- set user_content = visible_text(m.content) -%}
{{ user_content }}
{%- if enable_thinking is defined and not enable_thinking -%}
{%- if not user_content.endswith("/nothink") -%}
{{- '/nothink' -}}
{%- endif -%}
{%- endif -%}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string %}
{%- set reasoning_content = m.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set think_parts = content.split('</think>') %}
{%- if think_parts|length > 1 %}
{%- set before_end_think = think_parts[0] %}
{%- set after_end_think = think_parts[1] %}
{%- set think_start_parts = before_end_think.split('<think>') %}
{%- if think_start_parts|length > 1 %}
{%- set reasoning_content = think_start_parts[-1].lstrip('\n') %}
{%- endif %}
{%- set content = after_end_think.lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() + '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}
{% if m.tool_calls %}
{% for tc in m.tool_calls %}
{%- if tc.function %}
{%- set tc = tc.function %}
{%- endif %}
{{ '\n<tool_call>' + tc.name }}
{% set _args = tc.arguments %}
{% for k, v in _args.items() %}
<arg_key>{{ k }}</arg_key>
<arg_value>{{ v | tojson if v is not string else v }}</arg_value>
{% endfor %}
</tool_call>{% endfor %}
{% endif %}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|observation|>' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- m.content }}
{{- '\n</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}
<tool_response>
{{ tr.output if tr.output is defined else tr }}
</tool_response>{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
<|assistant|>{{- '\n<think></think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%} llama-server arguments (probably not the fastest)
ResultsThis is the strongest local coding model I've gotten running with a combination of a 3090 and a Ryzen9 9900X with 64GB of RAM. Tool calling is nearly 100% reliable using Cline. In general, it's a better coder than Qwen3 30B A3B and GPT OSS 20B. Obviously, it's not Sonnet 4.5: It misses the point just often enough to be annoying, and I run out of context window before it can finish a complete feature end-to-end. But it's a useful model, even if it pushes my system to the absolute limit. My main use case is "first drafts of code that would be tedious to implement by hand". I assume that I'll be reading all the code closely and refactoring it a bit, but I'd do that anyway for production code. Mostly I ask it to write Rust and typed Python, plus an occasional bit of TypeScript. Obviously, my system won't run a full-sized DeepSeek. I've heard rumors of people getting a really tiny quant of a full-sized GLM 4.5 (355B A32B) running with a 3090 and ton of system RAM, but I've never tried. I also want to take a look at GPT 120 OSS at some point, which is the natural competitor for GLM 4.5 Air. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey all - love the project and the speed of IK_LLama over Llama.cpp! However, spent the weekend trying to get tool calling to work to no avail with GLM-4.5-Air.
It does things like this:
And then breaks Continue.dev. I'm running the latest build of IK_Llama:
Anyone else overcome this? Do I need to build a proxy? I reviewed several PRs that said they fixed the GLM Tool calls. Here is my continue dev config if it helps:
Beta Was this translation helpful? Give feedback.
All reactions