-
Notifications
You must be signed in to change notification settings - Fork 2.4k
fix: add GLM-4.6 thinking token support for OpenAI-compatible endpoints #8548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add detection for GLM-4.6 model variants
- Include thinking parameter { type: "enabled" } in requests for GLM-4.6
- Parse thinking tokens using XmlMatcher for <think> tags
- Handle reasoning_content in streaming responses
- Add comprehensive tests for GLM-4.6 functionality
Fixes #8547
| protected isGLM46Model(modelId: string): boolean { | ||
| // Check for various GLM-4.6 model naming patterns | ||
| const lowerModel = modelId.toLowerCase() | ||
| return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isGLM46Model check has a redundant condition (lowerModel === 'glm-4.6') since includes('glm-4.6') already covers it. Consider removing the extra check.
| return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6" | |
| return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review mode: evaluating my own changes with all the empathy of a linter in a cold data center.
| } | ||
|
|
||
| // Handle reasoning_content if present (for models that support it directly) | ||
| if (delta && "reasoning_content" in delta && delta.reasoning_content) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Potential double-emission of reasoning tokens. For GLM-4.6 you already parse via XmlMatcher; if the provider also populates reasoning_content in the same stream, this emits the same reasoning twice. Gate this branch when GLM-4.6 is active or dedupe.
| if (delta && "reasoning_content" in delta && delta.reasoning_content) { | |
| // Handle reasoning_content if present (avoid double-emitting when GLM '<think>' is parsed) | |
| if (!isGLM46 && delta && "reasoning_content" in delta && delta.reasoning_content) { | |
| yield { | |
| type: "reasoning", | |
| text: (delta.reasoning_content as string | undefined) || "", | |
| } | |
| } |
| protected isGLM46Model(modelId: string): boolean { | ||
| // Check for various GLM-4.6 model naming patterns | ||
| const lowerModel = modelId.toLowerCase() | ||
| return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P3: Minor simplification. The final equality check is redundant because .includes("glm-4.6") already covers it. A concise version improves readability.
| return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6" | |
| protected isGLM46Model(modelId: string): boolean { | |
| const lowerModel = modelId.toLowerCase() | |
| return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") | |
| } |
| import OpenAI from "openai" | ||
| import { Anthropic } from "@anthropic-ai/sdk" | ||
|
|
||
| import type { ModelInfo } from "@roo-code/types" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P3: Unused import ModelInfo can trigger no-unused-vars in stricter configs.
| import type { ModelInfo } from "@roo-code/types" |
|
Hi, Guys. I tried this PR with ik_llama.cpp running locally the Ubergarm's GLM-4.6 model with Unsloth's chat template (--jinja --chat-template-file) but I still don't have reasoning in the Roo Code. The reasoning works fine in Roo Code with DeepSeek v3.1 running locally and I also get GLM-4.6 reasoning in the ik_llama.cpp built-in webui and in the continue vscode extension. Do I need to do something else in order to make it works with this PR? I'm using the openai compatible endpoint in Roo Code with "glm-4.6" alias in the ik_llama.cpp |
This PR attempts to address Issue #8547 by enabling thinking tokens for GLM-4.6 when using OpenAI-compatible custom endpoints.
Problem
GLM-4.6 was not generating thinking tokens when used through OpenAI-compatible custom endpoints because the required
"thinking": {"type": "enabled"}parameter was missing from requests.Solution
<think>tagsreasoning_contentfield in streaming responsesTesting
Impact
This change only affects GLM-4.6 models when used through OpenAI-compatible endpoints. Other models and providers remain unaffected.
Fixes #8547
Feedback and guidance are welcome!
Important
Adds GLM-4.6 thinking token support for OpenAI-compatible endpoints by detecting models, adding parameters, and parsing tokens.
glm-4.6,GLM-4.6,glm-4-6,GLM-4-6) inbase-openai-compatible-provider.ts.thinking: { type: "enabled" }parameter for GLM-4.6 models increateStream().XmlMatcherfor<think>tags increateMessage().reasoning_contentfield in streaming responses increateMessage().base-openai-compatible-provider.spec.tsfor model detection, thinking parameter addition, and token parsing.reasoning_contentin responses.This description was created by
for aada7cc. You can customize this summary. It will automatically update as commits are pushed.