fix: improve GLM-4.6 thinking token support for better compatibility #8643
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR improves GLM-4.6 thinking token support to ensure compatibility with various OpenAI-compatible endpoints, particularly addressing the issue reported with ik_llama.cpp.
Problem
As reported in #8547 and by @ChicoPinto70, the previous implementation in PR #8548 did not work with all OpenAI-compatible endpoints, specifically ik_llama.cpp. The issue was that:
thinking: { type: "enabled" }parameter was always added for GLM-4.6 modelsSolution
This implementation takes a more conservative and compatible approach:
Key Changes:
Optional thinking parameter: The
thinkingparameter is now disabled by default and only added when explicitly enabled via configuration (openAiEnableThinkingParameter: true)Multiple parsing strategies: The implementation now handles thinking tokens through three different methods to ensure maximum compatibility:
<think>...</think>) parsing using XmlMatcherreasoning_contentfield in the response delta (as used by some implementations)thinkingparameter for endpoints that support itComprehensive testing: Added extensive tests covering all scenarios including:
Benefits
Testing
All tests pass ✅:
Related Issues
Fixes #8547
Addresses feedback from PR #8548
For @kavehsfv
This implementation ensures GLM-4.6 thinking tokens work across different OpenAI-compatible endpoints. Once merged, it will be included in the next release cycle. The fix is backward-compatible and should work with your setup.
cc: @ChicoPinto70 - This should now work with your ik_llama.cpp setup. The thinking parameter is disabled by default for maximum compatibility.
Important
Improves GLM-4.6 thinking token support by making the thinking parameter optional and adding multiple parsing strategies for better compatibility.
thinkingparameter is now optional for GLM-4.6 models, enabled viaopenAiEnableThinkingParameter.reasoning_content, and optionalthinkingparameter.base-openai-compatible-provider.spec.tsfor default behavior, explicit enabling, XML tag parsing,reasoning_contenthandling, and mixed formats.createStream()andcreateMessage()inbase-openai-compatible-provider.tsto handle new logic for thinking tokens.isGLM46Model()andshouldAddThinkingParameter()to determine model type and parameter inclusion.This description was created by
for d9e20b2. You can customize this summary. It will automatically update as commits are pushed.