-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
What specific problem does this solve?
I’ve been using Roo Code with the Gemini 2.5 Pro free tier and consistently run into 429 errors. After reviewing the docs and experimenting with API configuration profiles and condensing settings, I found:
-
Setting API configuration profiles with a 60s rate limit does not help. A single request can still exceed the free-tier cap.
-
Lowering Intelligent Context Condensing to 20% also does not enforce the 125k input token maximum.
-
As a result, there’s no way to constrain requests to stay under Google’s new 125k input token quota.
This means that Gemini 2.5 Pro free tier is currently not usable with Roo Code, since requests routinely exceed the quota and fail immediately with 429 errors.
Adding to the problem, other “free model” options (like Grok Fast 1) have just expired as of Sept 10. That leaves no practical free alternative available, so fixing this is important for anyone relying on the Gemini free tier.
Additional context (optional)
Steps to Reproduce:
- Use Gemini 2.5 Pro free tier with Roo Code.
- Create a session large enough to exceed 125k tokens.
- Set a rate limit of 60s in API configuration profiles.
- Lower Intelligent Context Condensing threshold to 20%.
- Run a generation → API still responds with 429.
Expected Behavior:
There should be a way to configure or automatically enforce the 125k per-request input token ceiling for Gemini free tier, so that single requests do not exceed Google’s quota.
Actual Behavior:
Neither rate limiting nor condensing prevents requests from exceeding 125k tokens. Free-tier Gemini 2.5 Pro is effectively unusable in Roo Code.
Suggestion:
- Add a configurable max input tokens per request parameter in API profiles.
- Ideally, combine this with rate limiting to handle both per-request and per-minute quota rules.
Roo Code Task Links (Optional)
No response
Request checklist
- I've searched existing Issues and Discussions for duplicates
- This describes a specific problem with clear impact and context
Interested in implementing this?
- Yes, I'd like to help implement this feature
Implementation requirements
- I understand this needs approval before implementation begins
How should this be solved? (REQUIRED if contributing, optional otherwise)
Suggestion:
- Add a configurable max input tokens per request parameter in API profiles.
- Ideally, combine this with rate limiting to handle both per-request and per-minute quota rules.
How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)
No response
Technical considerations (REQUIRED if contributing, optional otherwise)
No response
Trade-offs and risks (REQUIRED if contributing, optional otherwise)
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status