Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 7, 2025

This PR fixes issue #6806 where GLM-4.5 models on OpenRouter were failing with token limit errors after upgrading to v3.25.8.

Problem

The commit c52fdc4 introduced a 20% clamping threshold for model max tokens relative to the context window. This was too restrictive for models like GLM-4.5 that legitimately require high output token counts (98,304 tokens out of 131,072 context window = 75%).

Solution

Adjusted the clamping threshold from 20% to 80% of the context window. This:

  • Prevents models from using the entire context for output (which could cause issues)
  • Allows models with legitimate high output requirements like GLM-4.5 to function properly
  • Only applies clamping when truly necessary (when maxTokens > 80% of context)

Changes

  • Modified getModelMaxOutputTokens function in src/shared/api.ts to use 80% threshold
  • Updated all related test cases to reflect the new threshold
  • All tests pass successfully

Testing

  • src/shared/__tests__/api.spec.ts - 21 tests passing
  • src/api/providers/__tests__/openrouter.spec.ts - 12 tests passing
  • src/api/transform/__tests__/model-params.spec.ts - 45 tests passing
  • ✅ All linting and type checks pass

Fixes #6806


Important

Adjusts token clamping threshold from 20% to 80% for GLM-4.5 compatibility, updating getModelMaxOutputTokens and related tests.

  • Behavior:
    • Adjusts token clamping threshold from 20% to 80% in getModelMaxOutputTokens in api.ts.
    • Ensures models with high output requirements, like GLM-4.5, function correctly.
    • Clamping only applies when maxTokens > 80% of context window.
  • Testing:
    • Updates test cases in api.spec.ts, openrouter.spec.ts, and model-params.spec.ts to reflect new 80% threshold.
    • All tests pass successfully, ensuring no regressions.
  • Misc:

This description was created by Ellipsis for 1fb46fc. You can customize this summary. It will automatically update as commits are pushed.

…atibility

The previous 20% clamping threshold was too restrictive for models like GLM-4.5
that have legitimate high output token requirements (98,304 tokens out of 131,072
context window = 75%). This change only applies clamping when maxTokens exceeds
80% of the context window, preventing models from using the entire context for
output while still allowing models with high output requirements to function properly.

Fixes #6806
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 7, 2025 13:43
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Aug 7, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code because apparently I trust no one, not even myself.

// Only apply clamping if maxTokens is more than 80% of context window
if (model.maxTokens > model.contextWindow * 0.8) {
// Clamp to 80% to leave room for input
return Math.floor(model.contextWindow * 0.8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using Math.floor() here intentional? With a context window of 131,072, this gives 104,857 tokens instead of 104,858. While minor, would Math.ceil() or Math.round() better maximize available tokens for edge cases?

if (model.maxTokens) {
return Math.min(model.maxTokens, model.contextWindow * 0.2)
// Only apply clamping if maxTokens is more than 80% of context window
if (model.maxTokens > model.contextWindow * 0.8) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting this magic number to a named constant like MAX_OUTPUT_TOKEN_RATIO = 0.8 for better maintainability. This would make it easier to adjust in the future and clearer about the intent.

})
expect(result).toBe(20_000) // Should use model.maxTokens since it's exactly at 20%
expect(result).toBe(80_000) // Should use model.maxTokens since it's at 80%
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to add a specific test case for the GLM-4.5 scenario that triggered this issue? Something like:

Suggested change
})
test("should handle GLM-4.5 model with 98,304 tokens out of 131,072 context window", () => {
const model: ModelInfo = {
contextWindow: 131_072,
supportsPromptCache: false,
maxTokens: 98_304, // 75% of context window
}
const settings: ProviderSettings = {
apiProvider: "openrouter",
}
const result = getModelMaxOutputTokens({
modelId: "z.al/glm-4.5",
model,
settings,
format: "openrouter",
})
expect(result).toBe(98_304) // Should use model.maxTokens since 75% < 80%
})


// If model has explicit maxTokens, clamp it to 20% of the context window
// If model has explicit maxTokens, only clamp it if it exceeds 80% of the context window
// This prevents models from using the entire context for output while still allowing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is good, but could we be more explicit about why 80% was chosen? Perhaps mention that this leaves approximately 20% for input tokens and system prompts, which is typically sufficient for most use cases?

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 7, 2025
@roomote roomote bot mentioned this pull request Aug 7, 2025
@mrubens mrubens closed this Aug 7, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 7, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

GLM 4.5 OR Chutes AI error 400

4 participants