-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: Honor Gemini retryDelay and improve rate limit handling #8013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Enhanced GeminiHandler to properly parse 429 errors with RetryInfo and QuotaFailure - Added GeminiError class to preserve structured error details - Updated retry logic to respect provider-suggested delays with 2-second buffer - Added distinction between temporary rate limiting and quota exhaustion - Improved user messaging for rate limit scenarios - Added comprehensive tests for rate limit handling Fixes #8012
| // Don't retry for quota exhaustion - show clear message and fail | ||
| await this.say( | ||
| "error", | ||
| `Gemini API quota exhausted. ${quotaFailure.violations?.[0]?.description || "Your daily or monthly quota has been exceeded."}\n\nPlease check your quota limits at: https://ai.google.dev/gemini-api/docs/rate-limits`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider wrapping user‐facing strings (e.g. the quota exhaustion message) in a translation function (t()) for consistency with internationalization practices.
This comment was generated because it violated a code review rule: irule_C0ez7Rji6ANcGkkX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code because apparently I trust no one, not even myself.
| if (match) { | ||
| exponentialDelay = Number(match[1]) + 1 | ||
| // Add a small buffer (1-2 seconds) as recommended | ||
| exponentialDelay = Number(match[1]) + 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 2-second buffer intentional here? The issue mentions "1-2 seconds" but we're using a fixed 2-second buffer. Would it be worth making this configurable?
| if (quotaFailure && !geminiRetryDetails?.retryDelay) { | ||
| // Check if the error message indicates daily/monthly quota exhaustion | ||
| const isQuotaExhausted = | ||
| error.message?.toLowerCase().includes("quota") && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice similar quota exhaustion checking logic here and at lines 2816-2819. Could we extract this into a helper method to reduce duplication?
| { input: "59s", expected: 59 }, | ||
| { input: "120s", expected: 120 }, | ||
| { input: "1s", expected: 1 }, | ||
| { input: "0s", expected: 0 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Could we add a test case for when retryDelay is "0s" to ensure the buffer is still applied correctly in edge cases?
| errorDetails?: Array<GeminiRetryInfo | GeminiQuotaFailure | any> | ||
| } | ||
|
|
||
| export class GeminiError extends Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding JSDoc comments to document the purpose and structure of this error class. It would help future maintainers understand when and how to use it.
|
Quota exhaustion detection may be unreliable. Logic relies on error.message substrings (“daily”/“monthly”/“exceeded”), which risks misclassification and unnecessary retries. Prefer structured signal: treat presence of QuotaFailure without RetryInfo as exhaustion. |
Summary
This PR addresses Issue #8012 by implementing proper handling of Google Gemini's rate limiting responses, including honoring the provider's suggested retry delays and distinguishing between temporary rate limits and quota exhaustion.
Changes
Enhanced Error Handling
GeminiErrorclass to preserve structured error details from Gemini APIRetryInfoandQuotaFailurefrom error responsesImproved Retry Logic
retryDelaywhen present (e.g., "59s")Better User Experience
Testing
Acceptance Criteria Met
✅ Given a 429 with RetryInfo.retryDelay, the app waits the specified time (+ buffer) before retrying
✅ Given a 429 indicating quota exhaustion, the app does not retry and shows a clear message
✅ Given a 429 without RetryInfo, the app uses exponential backoff
✅ User-facing messages are concise and non-technical with links to documentation
Testing
cd src && npx vitest run api/providers/__tests__/gemini-rate-limit.spec.tscd src && npx vitest run api/providers/__tests__/gemini.spec.tsFixes #8012
Important
Improves Google Gemini rate limit handling by introducing structured error handling, enhanced retry logic, and clear user messaging, with comprehensive testing.
GeminiErrorclass ingemini.tsto handle structured error details from Gemini API.createMessageandcompletePromptmethods ingemini.ts.RetryInfoandQuotaFailurefrom error responses.Task.tsto honorretryDelayfrom Gemini API, adding a 2-second buffer.Task.ts.gemini-rate-limit.spec.tsfor comprehensive testing of rate limit handling.RetryInfoparsing,QuotaFailuredetection, and error transformation.This description was created by
for 125fa83. You can customize this summary. It will automatically update as commits are pushed.