Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Sep 10, 2025

This PR fixes #7853 by adding a configurable maxInputTokens parameter to enforce per-request token limits, particularly useful for Gemini 2.5 Pro free tier users who need to stay within the 125k token limit.

Problem

Gemini 2.5 Pro free tier users were experiencing 429 errors because there was no way to enforce the 125k input token limit per request. The existing rate limiting and context condensing features were not preventing single requests from exceeding Google's quota.

Solution

  • Added maxInputTokens field to the ProviderSettings schema to allow configuring max input tokens per request
  • Updated sliding window truncation logic to respect the maxInputTokens limit when provided
  • The implementation uses the more restrictive limit between maxInputTokens and the context window
  • Added comprehensive tests to verify the functionality works correctly

Usage

Users can now set maxInputTokens: 125000 in their API configuration to stay within the Gemini free tier limits:

{
  "apiKey": "your-api-key",
  "maxInputTokens": 125000
}

Testing

  • Added 3 new test cases to verify the maxInputTokens functionality
  • All existing tests pass without regression
  • Tests verify proper truncation when limit is exceeded and correct behavior when choosing between multiple limits

Review Results

The implementation was reviewed using the /review command with a 95% confidence score and PROCEED recommendation.

Fixes #7853


Important

Adds maxInputTokens parameter to enforce per-request token limits, particularly for Gemini 2.5 Pro free tier users, with updates to schema, logic, and tests.

  • Behavior:
    • Adds maxInputTokens to ProviderSettings schema in provider-settings.ts to enforce per-request token limits.
    • Updates truncateConversationIfNeeded() in index.ts to respect maxInputTokens.
    • Uses the more restrictive limit between maxInputTokens and context window.
  • Testing:
    • Adds tests in sliding-window.spec.ts to verify maxInputTokens functionality.
    • Tests include scenarios where maxInputTokens is exceeded, not exceeded, and compared with context window.
  • Misc:

This description was created by Ellipsis for a1125ce. You can customize this summary. It will automatically update as commits are pushed.

- Add maxInputTokens field to ProviderSettings schema to allow configuring max input tokens per request
- Update sliding window logic to respect maxInputTokens limit when provided
- Use the more restrictive limit between maxInputTokens and context window
- Add comprehensive tests for the new functionality

This addresses the issue where Gemini 2.5 Pro free tier users cannot enforce the 125k input token limit, causing 429 errors. Users can now set maxInputTokens: 125000 in their API configuration to stay within the free tier limits.

Fixes #7853
@roomote roomote bot requested review from cte, jr and mrubens as code owners September 10, 2025 15:24
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Sep 10, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed my own code and found it surprisingly adequate. The bar was low, but I cleared it.

modelMaxThinkingTokens: z.number().optional(),

// Model input token limit (for providers with per-request limits like Gemini free tier)
maxInputTokens: z.number().min(1).optional(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding more detailed JSDoc comments for the maxInputTokens parameter. It would be helpful to explain its purpose, when to use it, and provide example values for different providers (e.g., "125000 for Gemini 2.5 Pro free tier"). This would make it easier for users to understand how to configure this parameter correctly.

const allowedTokens = contextWindow * (1 - TOKEN_BUFFER_PERCENTAGE) - reservedTokens
// First check if there's a maxInputTokens limit (e.g., for Gemini free tier)
let allowedTokens: number
if (maxInputTokens && maxInputTokens > 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a warning log when maxInputTokens is unusually low (e.g., < 1000 tokens). This could help users catch configuration mistakes early:

Suggested change
if (maxInputTokens && maxInputTokens > 0) {
if (maxInputTokens && maxInputTokens > 0) {
if (maxInputTokens < 1000) {
console.warn(`maxInputTokens is set to ${maxInputTokens}, which seems unusually low. Please verify this is intentional.`);
}
// Use the more restrictive limit between maxInputTokens and context window
const contextWindowLimit = contextWindow * (1 - TOKEN_BUFFER_PERCENTAGE) - reservedTokens
allowedTokens = Math.min(maxInputTokens, contextWindowLimit)
}

expect(result2.messages.length).toBe(3) // Truncated with 0.5 fraction
})

it("should respect maxInputTokens limit when provided", async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage is comprehensive! Consider adding one more test case that explicitly verifies the interaction between maxInputTokens and the condensing feature to ensure they work together correctly when both are active. This would provide additional confidence that the two features don't interfere with each other.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 10, 2025
@daniel-lxs
Copy link
Member

This is incomplete, it didn't implement the UI, the issue probably needs scoping.

@daniel-lxs daniel-lxs closed this Sep 10, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Sep 10, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

No way to enforce 125k input token limit for Gemini 2.5 Pro free tier (makes free tier unusable)

4 participants