-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix: prevent OpenRouter context overflow by capping max_completion_tokens #7807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…kens - Fix issue where OpenRouter models like moonshotai/kimi-k2 fail with context overflow - Cap max_completion_tokens to 20% of context window when it equals full context window - Refine GPT-5 model detection to prevent false positives with OpenRouter models - Add comprehensive test coverage for edge cases Fixes #5658
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.
| safeMaxTokens = maxTokens | ||
| } else { | ||
| // Fall back to 20% of context window for safety | ||
| safeMaxTokens = Math.ceil(model.context_length * 0.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to define the 0.2 ratio as a named constant like for better maintainability? This magic number appears in multiple places and having a single source of truth would make future adjustments easier.
|
|
||
| // Calculate safe max output tokens | ||
| // If maxTokens from OpenRouter equals or exceeds the context window, use 20% of context window instead | ||
| // This prevents the "max_tokens equals context window" issue that causes API failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment could be more specific about why 20% was chosen. Consider: "20% leaves sufficient room for input tokens while maximizing output capacity, preventing API failures due to context overflow"
| safeMaxTokens = maxTokens | ||
| } else { | ||
| // Fall back to 20% of context window for safety | ||
| safeMaxTokens = Math.ceil(model.context_length * 0.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a defensive check for being 0 or undefined? While unlikely, it could cause division by zero or NaN issues:
| const isGpt5Model = modelId.toLowerCase().includes("gpt-5") | ||
| // Make sure we don't incorrectly identify OpenRouter models as GPT-5 | ||
| // OpenRouter models typically have format "provider/model" but native OpenAI models can be "openai/gpt-5" | ||
| const isGpt5Model = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPT-5 detection logic is getting complex. Would a helper function improve readability?
| // Should fall back to 20% of context window | ||
| expect(result.maxTokens).toBe(Math.ceil(100000 * 0.2)) // 20000 | ||
| expect(result.contextWindow).toBe(100000) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Consider adding an edge case test for very small context windows (e.g., 100 tokens) to ensure Math.ceil doesn't cause unexpected behavior with tiny values.
This PR attempts to address Issue #5658 by fixing the context overflow issue with OpenRouter models like moonshotai/kimi-k2.
Problem
OpenRouter models were failing with context overflow errors like:
The issue occurred because
max_completion_tokenswas being set to the full context window (131072), leaving no room for input tokens.Solution
parseOpenRouterModel()to capmax_completion_tokensto 20% of context window when it equals or exceeds the full context windowgetModelMaxOutputTokens()to prevent OpenRouter models from being incorrectly identified as native GPT-5 modelsChanges
src/api/providers/fetchers/openrouter.ts: Safe max token calculationsrc/shared/api.ts: Refined GPT-5 model detection logicsrc/api/providers/fetchers/__tests__/openrouter.spec.ts: Added context overflow test casessrc/shared/__tests__/api.spec.ts: Added GPT-5 detection edge case testsTesting
For models like kimi-k2 with 131k context window, this fix caps output tokens to ~26k (20%), leaving ~105k for input tokens and preventing the overflow.
Feedback and guidance are welcome!
Important
Fixes context overflow in OpenRouter models by capping
max_completion_tokensand refines GPT-5 detection logic.max_completion_tokensto 20% of context window inparseOpenRouterModel()when it equals or exceeds the full context window.getModelMaxOutputTokens()to prevent OpenRouter models from being misidentified as native GPT-5 models.openrouter.spec.tsfor context overflow scenarios and reasonable token usage.api.spec.tsfor GPT-5 detection and token capping logic.openrouter.ts: Implements safe max token calculation.api.ts: Refines GPT-5 model detection logic.This description was created by
for 56f619b. You can customize this summary. It will automatically update as commits are pushed.