fix: adjust token clamping threshold from 20% to 80% for GLM-4.5 compatibility #6807

roomote · 2025-08-07T13:43:12Z

This PR fixes issue #6806 where GLM-4.5 models on OpenRouter were failing with token limit errors after upgrading to v3.25.8.

Problem

The commit c52fdc4 introduced a 20% clamping threshold for model max tokens relative to the context window. This was too restrictive for models like GLM-4.5 that legitimately require high output token counts (98,304 tokens out of 131,072 context window = 75%).

Solution

Adjusted the clamping threshold from 20% to 80% of the context window. This:

Prevents models from using the entire context for output (which could cause issues)
Allows models with legitimate high output requirements like GLM-4.5 to function properly
Only applies clamping when truly necessary (when maxTokens > 80% of context)

Changes

Modified getModelMaxOutputTokens function in src/shared/api.ts to use 80% threshold
Updated all related test cases to reflect the new threshold
All tests pass successfully

Testing

✅ src/shared/__tests__/api.spec.ts - 21 tests passing
✅ src/api/providers/__tests__/openrouter.spec.ts - 12 tests passing
✅ src/api/transform/__tests__/model-params.spec.ts - 45 tests passing
✅ All linting and type checks pass

Fixes #6806

Important

Adjusts token clamping threshold from 20% to 80% for GLM-4.5 compatibility, updating getModelMaxOutputTokens and related tests.

Behavior:
- Adjusts token clamping threshold from 20% to 80% in getModelMaxOutputTokens in api.ts.
- Ensures models with high output requirements, like GLM-4.5, function correctly.
- Clamping only applies when maxTokens > 80% of context window.
Testing:
- Updates test cases in api.spec.ts, openrouter.spec.ts, and model-params.spec.ts to reflect new 80% threshold.
- All tests pass successfully, ensuring no regressions.
Misc:
- Fixes issue GLM 4.5 OR Chutes AI error 400 #6806 related to token limit errors for GLM-4.5 models.

^{This description was created by}^{for 1fb46fc. You can customize this summary. It will automatically update as commits are pushed.}

…atibility The previous 20% clamping threshold was too restrictive for models like GLM-4.5 that have legitimate high output token requirements (98,304 tokens out of 131,072 context window = 75%). This change only applies clamping when maxTokens exceeds 80% of the context window, preventing models from using the entire context for output while still allowing models with high output requirements to function properly. Fixes #6806

roomote

Reviewing my own code because apparently I trust no one, not even myself.

roomote · 2025-08-07T13:47:28Z

src/shared/api.ts

+		// Only apply clamping if maxTokens is more than 80% of context window
+		if (model.maxTokens > model.contextWindow * 0.8) {
+			// Clamp to 80% to leave room for input
+			return Math.floor(model.contextWindow * 0.8)


Is using Math.floor() here intentional? With a context window of 131,072, this gives 104,857 tokens instead of 104,858. While minor, would Math.ceil() or Math.round() better maximize available tokens for edge cases?

roomote · 2025-08-07T13:47:28Z

src/shared/api.ts

 	if (model.maxTokens) {
-		return Math.min(model.maxTokens, model.contextWindow * 0.2)
+		// Only apply clamping if maxTokens is more than 80% of context window
+		if (model.maxTokens > model.contextWindow * 0.8) {


Consider extracting this magic number to a named constant like MAX_OUTPUT_TOKEN_RATIO = 0.8 for better maintainability. This would make it easier to adjust in the future and clearer about the intent.

roomote · 2025-08-07T13:47:28Z

src/shared/__tests__/api.spec.ts

 		})
-		expect(result).toBe(20_000) // Should use model.maxTokens since it's exactly at 20%
+		expect(result).toBe(80_000) // Should use model.maxTokens since it's at 80%
 	})


Would it be helpful to add a specific test case for the GLM-4.5 scenario that triggered this issue? Something like:

Suggested change

})

test("should handle GLM-4.5 model with 98,304 tokens out of 131,072 context window", () => {

const model: ModelInfo = {

contextWindow: 131_072,

supportsPromptCache: false,

maxTokens: 98_304, // 75% of context window

}

const settings: ProviderSettings = {

apiProvider: "openrouter",

}

const result = getModelMaxOutputTokens({

modelId: "z.al/glm-4.5",

model,

settings,

format: "openrouter",

})

expect(result).toBe(98_304) // Should use model.maxTokens since 75% < 80%

})

roomote · 2025-08-07T13:47:28Z

src/shared/api.ts


-	// If model has explicit maxTokens, clamp it to 20% of the context window
+	// If model has explicit maxTokens, only clamp it if it exceeds 80% of the context window
+	// This prevents models from using the entire context for output while still allowing


The comment is good, but could we be more explicit about why 80% was chosen? Perhaps mention that this leaves approximately 20% for input tokens and system prompts, which is typically sufficient for most use cases?

roomote bot requested review from cte, jr and mrubens as code owners August 7, 2025 13:43

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Aug 7, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Aug 7, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Aug 7, 2025

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Aug 7, 2025

roomote bot commented Aug 7, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 7, 2025

roomote bot mentioned this pull request Aug 7, 2025

GLM 4.5 OR Chutes AI error 400 #6806

Closed

mrubens closed this Aug 7, 2025

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 7, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: adjust token clamping threshold from 20% to 80% for GLM-4.5 compatibility #6807

fix: adjust token clamping threshold from 20% to 80% for GLM-4.5 compatibility #6807

Uh oh!

roomote bot commented Aug 7, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

roomote bot left a comment

Uh oh!

roomote bot Aug 7, 2025

Uh oh!

roomote bot Aug 7, 2025

Uh oh!

roomote bot Aug 7, 2025

Uh oh!

roomote bot Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-	})
+test("should handle GLM-4.5 model with 98,304 tokens out of 131,072 context window", () => {
+  const model: ModelInfo = {
+    contextWindow: 131_072,
+    supportsPromptCache: false,
+    maxTokens: 98_304, // 75% of context window
+  }
+  const settings: ProviderSettings = {
+    apiProvider: "openrouter",
+  }
+  const result = getModelMaxOutputTokens({
+    modelId: "z.al/glm-4.5",
+    model,
+    settings,
+    format: "openrouter",
+  })
+  expect(result).toBe(98_304) // Should use model.maxTokens since 75% < 80%
+})

fix: adjust token clamping threshold from 20% to 80% for GLM-4.5 compatibility #6807

fix: adjust token clamping threshold from 20% to 80% for GLM-4.5 compatibility #6807

Uh oh!

Conversation

roomote bot commented Aug 7, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Testing

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roomote bot commented Aug 7, 2025 •

edited by ellipsis-dev bot

Loading