-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: adjust token clamping threshold from 20% to 80% for GLM-4.5 compatibility #6807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -90,9 +90,16 @@ export const getModelMaxOutputTokens = ({ | |
| return ANTHROPIC_DEFAULT_MAX_TOKENS | ||
| } | ||
|
|
||
| // If model has explicit maxTokens, clamp it to 20% of the context window | ||
| // If model has explicit maxTokens, only clamp it if it exceeds 80% of the context window | ||
| // This prevents models from using the entire context for output while still allowing | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment is good, but could we be more explicit about why 80% was chosen? Perhaps mention that this leaves approximately 20% for input tokens and system prompts, which is typically sufficient for most use cases? |
||
| // models with legitimately high output requirements (like GLM-4.5) to function | ||
| if (model.maxTokens) { | ||
| return Math.min(model.maxTokens, model.contextWindow * 0.2) | ||
| // Only apply clamping if maxTokens is more than 80% of context window | ||
| if (model.maxTokens > model.contextWindow * 0.8) { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider extracting this magic number to a named constant like |
||
| // Clamp to 80% to leave room for input | ||
| return Math.floor(model.contextWindow * 0.8) | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is using |
||
| } | ||
| return model.maxTokens | ||
| } | ||
|
|
||
| // For non-Anthropic formats without explicit maxTokens, return undefined | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to add a specific test case for the GLM-4.5 scenario that triggered this issue? Something like: