fix: prevent OpenRouter context overflow by capping max_completion_tokens #7807

roo-code-preview · 2025-09-09T06:11:54Z

This PR attempts to address Issue #5658 by fixing the context overflow issue with OpenRouter models like moonshotai/kimi-k2.

Problem

OpenRouter models were failing with context overflow errors like:

400 This endpoint's maximum context length is 131072 tokens. However, you requested about 149189 tokens (18117 of text input, 131072 in the output).

The issue occurred because max_completion_tokens was being set to the full context window (131072), leaving no room for input tokens.

Solution

OpenRouter Model Parser Fix: Modified parseOpenRouterModel() to cap max_completion_tokens to 20% of context window when it equals or exceeds the full context window
GPT-5 Detection Refinement: Updated getModelMaxOutputTokens() to prevent OpenRouter models from being incorrectly identified as native GPT-5 models
Comprehensive Test Coverage: Added 6 new test cases covering edge cases and regression prevention

Changes

src/api/providers/fetchers/openrouter.ts: Safe max token calculation
src/shared/api.ts: Refined GPT-5 model detection logic
src/api/providers/fetchers/__tests__/openrouter.spec.ts: Added context overflow test cases
src/shared/__tests__/api.spec.ts: Added GPT-5 detection edge case tests

Testing

✅ All existing tests pass
✅ New tests cover the specific kimi-k2 context overflow scenario
✅ Edge cases for null/undefined max_completion_tokens handled
✅ GPT-5 model detection works correctly for both native and OpenRouter models

For models like kimi-k2 with 131k context window, this fix caps output tokens to ~26k (20%), leaving ~105k for input tokens and preventing the overflow.

Feedback and guidance are welcome!

Important

Fixes context overflow in OpenRouter models by capping max_completion_tokens and refines GPT-5 detection logic.

Behavior:
- Caps max_completion_tokens to 20% of context window in parseOpenRouterModel() when it equals or exceeds the full context window.
- Updates getModelMaxOutputTokens() to prevent OpenRouter models from being misidentified as native GPT-5 models.
Testing:
- Adds test cases in openrouter.spec.ts for context overflow scenarios and reasonable token usage.
- Adds test cases in api.spec.ts for GPT-5 detection and token capping logic.
Files:
- openrouter.ts: Implements safe max token calculation.
- api.ts: Refines GPT-5 model detection logic.

^{This description was created by}^{for 56f619b. You can customize this summary. It will automatically update as commits are pushed.}

…kens - Fix issue where OpenRouter models like moonshotai/kimi-k2 fail with context overflow - Cap max_completion_tokens to 20% of context window when it equals full context window - Refine GPT-5 model detection to prevent false positives with OpenRouter models - Add comprehensive test coverage for edge cases Fixes #5658

roomote

Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.

roomote · 2025-09-09T06:16:15Z

src/api/providers/fetchers/openrouter.ts

+		safeMaxTokens = maxTokens
+	} else {
+		// Fall back to 20% of context window for safety
+		safeMaxTokens = Math.ceil(model.context_length * 0.2)


Would it be better to define the 0.2 ratio as a named constant like for better maintainability? This magic number appears in multiple places and having a single source of truth would make future adjustments easier.

roomote · 2025-09-09T06:16:15Z

src/api/providers/fetchers/openrouter.ts


+	// Calculate safe max output tokens
+	// If maxTokens from OpenRouter equals or exceeds the context window, use 20% of context window instead
+	// This prevents the "max_tokens equals context window" issue that causes API failures


The comment could be more specific about why 20% was chosen. Consider: "20% leaves sufficient room for input tokens while maximizing output capacity, preventing API failures due to context overflow"

roomote · 2025-09-09T06:16:15Z

src/api/providers/fetchers/openrouter.ts

+		safeMaxTokens = maxTokens
+	} else {
+		// Fall back to 20% of context window for safety
+		safeMaxTokens = Math.ceil(model.context_length * 0.2)


Should we add a defensive check for being 0 or undefined? While unlikely, it could cause division by zero or NaN issues:

roomote · 2025-09-09T06:16:15Z

src/shared/api.ts

-		const isGpt5Model = modelId.toLowerCase().includes("gpt-5")
+		// Make sure we don't incorrectly identify OpenRouter models as GPT-5
+		// OpenRouter models typically have format "provider/model" but native OpenAI models can be "openai/gpt-5"
+		const isGpt5Model =


The GPT-5 detection logic is getting complex. Would a helper function improve readability?

roomote · 2025-09-09T06:16:16Z

src/api/providers/fetchers/__tests__/openrouter.spec.ts

+			// Should fall back to 20% of context window
+			expect(result.maxTokens).toBe(Math.ceil(100000 * 0.2)) // 20000
+			expect(result.contextWindow).toBe(100000)
+		})


Great test coverage! Consider adding an edge case test for very small context windows (e.g., 100 tokens) to ensure Math.ceil doesn't cause unexpected behavior with tiny values.

roo-code-preview bot requested review from cte, jr and mrubens as code owners September 9, 2025 06:11

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Sep 9, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Sep 9, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Sep 9, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Sep 9, 2025

roo-code-preview bot mentioned this pull request Sep 9, 2025

Model Context overflows unless you use OpenRouter transform / OpenRouter transform makes experience worse #5658

Closed

roomote bot reviewed Sep 9, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 9, 2025

daniel-lxs closed this Sep 9, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 9, 2025

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Sep 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent OpenRouter context overflow by capping max_completion_tokens #7807

fix: prevent OpenRouter context overflow by capping max_completion_tokens #7807

Uh oh!

roo-code-preview bot commented Sep 9, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

roomote bot left a comment

Uh oh!

roomote bot Sep 9, 2025

Uh oh!

roomote bot Sep 9, 2025

Uh oh!

roomote bot Sep 9, 2025

Uh oh!

roomote bot Sep 9, 2025

Uh oh!

roomote bot Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: prevent OpenRouter context overflow by capping max_completion_tokens #7807

fix: prevent OpenRouter context overflow by capping max_completion_tokens #7807

Uh oh!

Conversation

roo-code-preview bot commented Sep 9, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Testing

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roo-code-preview bot commented Sep 9, 2025 •

edited by ellipsis-dev bot

Loading