Skip to content

Commit 7417ac9

Browse files
authored
Update llm-benchmark.md
1 parent 935e2c5 commit 7417ac9

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/kagi/ai/llm-benchmark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ Please see notes below the table if you see results you find surprising, or get
8888

8989
</div>
9090

91-
**Notes on chain of thought:** Models that use chain of thought do drastically better in this benchmark. Some models, like **kimi-k2** perform worse with our instruction following prompts (ex: "answer in only one word") seem to shut down reasoning than usual. We also test more comprehensively on non-english/chinese languages, which seems to punish some models (Qwen3-32B).
91+
**Notes on chain of thought:** Models that use chain of thought do drastically better in this benchmark. Some models, like **kimi-k2** perform worse with our instruction following prompts (ex: "answer in only one word") seem to shut down reasoning. We also test more comprehensively on non-english/chinese languages, which seems to punish some models (Qwen3-32B).
9292

9393
**Model Costs:** Costs in the reasoning benchmark are mostly from the models' output tokens. **This table's cost column is not representative for input token heavy tasks like web search or retrieval.**
9494

0 commit comments

Comments
 (0)