Skip to content

Conversation

@roo-code-preview
Copy link

@roo-code-preview roo-code-preview bot commented Sep 9, 2025

This PR attempts to address Issue #5658 by fixing the context overflow issue with OpenRouter models like moonshotai/kimi-k2.

Problem

OpenRouter models were failing with context overflow errors like:

400 This endpoint's maximum context length is 131072 tokens. However, you requested about 149189 tokens (18117 of text input, 131072 in the output).

The issue occurred because max_completion_tokens was being set to the full context window (131072), leaving no room for input tokens.

Solution

  1. OpenRouter Model Parser Fix: Modified parseOpenRouterModel() to cap max_completion_tokens to 20% of context window when it equals or exceeds the full context window
  2. GPT-5 Detection Refinement: Updated getModelMaxOutputTokens() to prevent OpenRouter models from being incorrectly identified as native GPT-5 models
  3. Comprehensive Test Coverage: Added 6 new test cases covering edge cases and regression prevention

Changes

  • src/api/providers/fetchers/openrouter.ts: Safe max token calculation
  • src/shared/api.ts: Refined GPT-5 model detection logic
  • src/api/providers/fetchers/__tests__/openrouter.spec.ts: Added context overflow test cases
  • src/shared/__tests__/api.spec.ts: Added GPT-5 detection edge case tests

Testing

  • ✅ All existing tests pass
  • ✅ New tests cover the specific kimi-k2 context overflow scenario
  • ✅ Edge cases for null/undefined max_completion_tokens handled
  • ✅ GPT-5 model detection works correctly for both native and OpenRouter models

For models like kimi-k2 with 131k context window, this fix caps output tokens to ~26k (20%), leaving ~105k for input tokens and preventing the overflow.

Feedback and guidance are welcome!


Important

Fixes context overflow in OpenRouter models by capping max_completion_tokens and refines GPT-5 detection logic.

  • Behavior:
    • Caps max_completion_tokens to 20% of context window in parseOpenRouterModel() when it equals or exceeds the full context window.
    • Updates getModelMaxOutputTokens() to prevent OpenRouter models from being misidentified as native GPT-5 models.
  • Testing:
    • Adds test cases in openrouter.spec.ts for context overflow scenarios and reasonable token usage.
    • Adds test cases in api.spec.ts for GPT-5 detection and token capping logic.
  • Files:
    • openrouter.ts: Implements safe max token calculation.
    • api.ts: Refines GPT-5 model detection logic.

This description was created by Ellipsis for 56f619b. You can customize this summary. It will automatically update as commits are pushed.

…kens

- Fix issue where OpenRouter models like moonshotai/kimi-k2 fail with context overflow
- Cap max_completion_tokens to 20% of context window when it equals full context window
- Refine GPT-5 model detection to prevent false positives with OpenRouter models
- Add comprehensive test coverage for edge cases

Fixes #5658
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.

safeMaxTokens = maxTokens
} else {
// Fall back to 20% of context window for safety
safeMaxTokens = Math.ceil(model.context_length * 0.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to define the 0.2 ratio as a named constant like for better maintainability? This magic number appears in multiple places and having a single source of truth would make future adjustments easier.


// Calculate safe max output tokens
// If maxTokens from OpenRouter equals or exceeds the context window, use 20% of context window instead
// This prevents the "max_tokens equals context window" issue that causes API failures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment could be more specific about why 20% was chosen. Consider: "20% leaves sufficient room for input tokens while maximizing output capacity, preventing API failures due to context overflow"

safeMaxTokens = maxTokens
} else {
// Fall back to 20% of context window for safety
safeMaxTokens = Math.ceil(model.context_length * 0.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a defensive check for being 0 or undefined? While unlikely, it could cause division by zero or NaN issues:

const isGpt5Model = modelId.toLowerCase().includes("gpt-5")
// Make sure we don't incorrectly identify OpenRouter models as GPT-5
// OpenRouter models typically have format "provider/model" but native OpenAI models can be "openai/gpt-5"
const isGpt5Model =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPT-5 detection logic is getting complex. Would a helper function improve readability?

// Should fall back to 20% of context window
expect(result.maxTokens).toBe(Math.ceil(100000 * 0.2)) // 20000
expect(result.contextWindow).toBe(100000)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage! Consider adding an edge case test for very small context windows (e.g., 100 tokens) to ensure Math.ceil doesn't cause unexpected behavior with tiny values.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 9, 2025
@daniel-lxs daniel-lxs closed this Sep 9, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 9, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants