Skip to content

Commit 5d43fcd

Browse files
authored
Add GLM 4.5 and GLM 4.5 Air benchmark results (#806)
* Add GLM 4.5 and GLM 4.5 Air * update timestamp
1 parent bd451ea commit 5d43fcd

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

docs/kagi/ai/llm-benchmark.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Introducing the Kagi LLM Benchmarking Project, which evaluates major large langu
66

77
The Kagi Reasoning Benchmark is an **unpolluted reasoning benchmark** to assess large language models (LLMs) through diverse, challenging tasks. Unlike standard benchmarks, the tasks in this benchmark are unpublished, not found in training data, or "gamed" in fine-tuning. The task set changes over time (mostly getting more difficult) to better represent the current state of the art.
88

9-
Last update: **July 22nd, 2025**
9+
Last update: **July 28th, 2025**
1010

1111
Tasks: **100**
1212

@@ -33,6 +33,7 @@ Input Tokens for all tasks: **10859**
3333
| grok-3-mini `[CoT-high]` | 62.58 | 1.51 | 0.28 | 12k | 9.77 | xai |
3434
| qwen-3-235b-a22b `[CoT]` | 61.03 | 8.44 | 0.06 | 9k | 3.34 | Nebius |
3535
| grok-3-mini `[CoT-low]` | 60.76 | 1.04 | 0.02 | 4.9k | 8.16 | kagi (all) |
36+
| z-ai/glm-4.5 `[CoT]` | 60.34 | 8.39 | 0.19 | 15k | 10.89 | openrouter |
3637
| claude-4-opus [no-think] | 60.21 | 1.27 | 1.06 | 9.1k | 24.66 | kagi (ultimate) |
3738
| perplexity/sonar-pro `[CoT]` | 55.21 | 0.46 | 0.11 | 12k | 66.52 | perplexity |
3839
| chatgpt-4o | 54.80 | - | 0.57 | 18k | - | kagi (ultimate) |
@@ -45,6 +46,7 @@ Input Tokens for all tasks: **10859**
4546
| thedrummer/anubis-pro-105b-v1 | 48.96 | 7.96 | 0.02 | 14k | 3.10 | openrouter |
4647
| llama-4-maverick | 48.93 | - | 0.03 | 20k | - | kagi (all) |
4748
| gpt-4-1-mini | 48.80 | - | 0.05 | 23k | - | kagi (all) |
49+
| z-ai/glm-4.5-air `[CoT]` | 48.79 | 4.52 | 0.37 | 16k | 24.72 | openrouter |
4850
| grok-3 | 48.40 | - | 0.70 | 16k | - | kagi (ultimate) |
4951
| kimi-k2 | 47.81 | 12.97 | 1.58 | 80k | 41.19 | kagi (ultimate) |
5052
| mistral-medium | 47.20 | 0.68 | 0.05 | 12k | 53.31 | kagi (all) |

0 commit comments

Comments
 (0)