Skip to content

Conversation

@roomote
Copy link

@roomote roomote bot commented Oct 7, 2025

This PR attempts to address Issue #8547 by enabling thinking tokens for GLM-4.6 when using OpenAI-compatible custom endpoints.

Problem

GLM-4.6 was not generating thinking tokens when used through OpenAI-compatible custom endpoints because the required "thinking": {"type": "enabled"} parameter was missing from requests.

Solution

  • Added detection for GLM-4.6 model variants (glm-4.6, GLM-4.6, glm-4-6, GLM-4-6)
  • Include the thinking parameter in requests when GLM-4.6 is detected
  • Parse thinking tokens using XmlMatcher for <think> tags
  • Handle reasoning_content field in streaming responses
  • Added comprehensive tests to verify functionality

Testing

  • All existing tests pass ✅
  • New tests added for GLM-4.6 functionality ✅
  • Tested model detection, parameter addition, and token parsing

Impact

This change only affects GLM-4.6 models when used through OpenAI-compatible endpoints. Other models and providers remain unaffected.

Fixes #8547

Feedback and guidance are welcome!


Important

Adds GLM-4.6 thinking token support for OpenAI-compatible endpoints by detecting models, adding parameters, and parsing tokens.

  • Behavior:
    • Detects GLM-4.6 model variants (glm-4.6, GLM-4.6, glm-4-6, GLM-4-6) in base-openai-compatible-provider.ts.
    • Adds thinking: { type: "enabled" } parameter for GLM-4.6 models in createStream().
    • Parses thinking tokens using XmlMatcher for <think> tags in createMessage().
    • Handles reasoning_content field in streaming responses in createMessage().
  • Testing:
    • Adds tests in base-openai-compatible-provider.spec.ts for model detection, thinking parameter addition, and token parsing.
    • Verifies handling of reasoning_content in responses.
  • Impact:
    • Affects only GLM-4.6 models when used through OpenAI-compatible endpoints.

This description was created by Ellipsis for aada7cc. You can customize this summary. It will automatically update as commits are pushed.

- Add detection for GLM-4.6 model variants
- Include thinking parameter { type: "enabled" } in requests for GLM-4.6
- Parse thinking tokens using XmlMatcher for <think> tags
- Handle reasoning_content in streaming responses
- Add comprehensive tests for GLM-4.6 functionality

Fixes #8547
@roomote roomote bot requested review from cte, jr and mrubens as code owners October 7, 2025 11:24
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Oct 7, 2025
protected isGLM46Model(modelId: string): boolean {
// Check for various GLM-4.6 model naming patterns
const lowerModel = modelId.toLowerCase()
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isGLM46Model check has a redundant condition (lowerModel === 'glm-4.6') since includes('glm-4.6') already covers it. Consider removing the extra check.

Suggested change
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6"
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6")

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 7, 2025
Copy link
Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review mode: evaluating my own changes with all the empathy of a linter in a cold data center.

}

// Handle reasoning_content if present (for models that support it directly)
if (delta && "reasoning_content" in delta && delta.reasoning_content) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Potential double-emission of reasoning tokens. For GLM-4.6 you already parse via XmlMatcher; if the provider also populates reasoning_content in the same stream, this emits the same reasoning twice. Gate this branch when GLM-4.6 is active or dedupe.

Suggested change
if (delta && "reasoning_content" in delta && delta.reasoning_content) {
// Handle reasoning_content if present (avoid double-emitting when GLM '<think>' is parsed)
if (!isGLM46 && delta && "reasoning_content" in delta && delta.reasoning_content) {
yield {
type: "reasoning",
text: (delta.reasoning_content as string | undefined) || "",
}
}

protected isGLM46Model(modelId: string): boolean {
// Check for various GLM-4.6 model naming patterns
const lowerModel = modelId.toLowerCase()
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Minor simplification. The final equality check is redundant because .includes("glm-4.6") already covers it. A concise version improves readability.

Suggested change
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6"
protected isGLM46Model(modelId: string): boolean {
const lowerModel = modelId.toLowerCase()
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6")
}

import OpenAI from "openai"
import { Anthropic } from "@anthropic-ai/sdk"

import type { ModelInfo } from "@roo-code/types"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Unused import ModelInfo can trigger no-unused-vars in stricter configs.

Suggested change
import type { ModelInfo } from "@roo-code/types"

@ChicoPinto70
Copy link

ChicoPinto70 commented Oct 10, 2025

Hi, Guys. I tried this PR with ik_llama.cpp running locally the Ubergarm's GLM-4.6 model with Unsloth's chat template (--jinja --chat-template-file) but I still don't have reasoning in the Roo Code. The reasoning works fine in Roo Code with DeepSeek v3.1 running locally and I also get GLM-4.6 reasoning in the ik_llama.cpp built-in webui and in the continue vscode extension. Do I need to do something else in order to make it works with this PR?

I'm using the openai compatible endpoint in Roo Code with "glm-4.6" alias in the ik_llama.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] GLM-4.6 not generating thinking tokens when using OpenAI-compatible custom endpoint

5 participants