[ENHANCEMENT] Honor Gemini retryDelay; clarify rate-limit vs quota

## Type
Enhancement

## Problem
When Google Gemini returns a 429 response, it often includes a recommended wait time. The app's current retry behavior may not fully respect that guidance, leading to avoidable errors and confusing “Too Many Requests” messages.

## Context
This affects users on free and paid Gemini tiers when running heavier or bursty workflows that hit tokens-per-minute limits. Gemini responses can include a recommended delay (for example, retryDelay: "59s"). The app should wait for that duration (with a small safety buffer) and clearly indicate that it's a temporary rate limit. If the response indicates that the daily/monthly quota is exhausted, the app should stop retrying and show a clear, actionable message.
Reference with a full example error (includes RetryInfo.retryDelay): https://github.com/RooCodeInc/Roo-Code/issues/6680#issuecomment-3246594163

## Desired behavior
- Automatically pause and retry based on the provider’s suggested retry delay, with a small buffer.
- Show a simple notice that we’re waiting due to rate limiting, ideally with an approximate countdown.
- Distinguish between temporary rate limiting and genuine quota exhaustion, using different messages and behavior.
- Fall back to a reasonable backoff strategy when no suggested delay is provided.
- Keep the flow unobtrusive and avoid noisy or repeated error popups.

## Technical details for implementation
- Error shape to support:
  - HTTP 429 with status "Too Many Requests" and a top-level error.status of "RESOURCE_EXHAUSTED".
  - The error payload’s details array may include:
    - type.googleapis.com/google.rpc.RetryInfo with a retryDelay string like "59s" (google.protobuf.Duration format).
    - type.googleapis.com/google.rpc.QuotaFailure with fields such as quotaMetric, quotaId, quotaDimensions, and quotaValue indicating the limit that was exceeded (e.g., tokens per model per minute).
- Provider-layer expectations:
  - Preserve structured error information so the retry system can read it: HTTP status (429), top-level error.status, and the details array including any RetryInfo and QuotaFailure entries. Avoid flattening these into a plain string that discards fields.
- Retry behavior:
  - If RetryInfo.retryDelay is present, parse the duration, add a small buffer (e.g., +1–2s), and use that as the retry wait.
  - When both a local backoff and a provider-suggested delay exist, wait for the greater of the two.
  - If QuotaFailure is present and the message indicates true quota exhaustion (e.g., daily/monthly limit), stop retrying and show a clear message with next steps.
  - If no RetryInfo is present, fall back to exponential backoff with a reasonable cap.
- User messaging:
  - For temporary rate limiting (RetryInfo present): show a lightweight notice that we’re waiting (with a short countdown if feasible).
  - For quota exhaustion (QuotaFailure with no RetryInfo or explicit quota-exceeded message): do not retry; surface a clear, actionable message and link to the provider’s rate-limit docs.

## Acceptance criteria
- Given a 429 with RetryInfo.retryDelay (e.g., "59s"), the app waits approximately that long (+ small buffer) before retrying, without spamming error popups, and resumes automatically.
- Given a 429 indicating quota exhaustion with QuotaFailure and no RetryInfo, the app does not retry and shows a clear message explaining that the quota was exceeded and how to proceed.
- Given a 429 without RetryInfo, the app uses exponential backoff (capped) and communicates that it’s retrying due to rate limiting.
- In all cases, the user-facing copy is concise and non-technical, and links to the provider’s rate limit documentation for context (e.g., https://ai.google.dev/gemini-api/docs/rate-limits).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENHANCEMENT] Honor Gemini retryDelay; clarify rate-limit vs quota #8012

Type

Problem

Context

Desired behavior

Technical details for implementation

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENHANCEMENT] Honor Gemini retryDelay; clarify rate-limit vs quota #8012

Description

Type

Problem

Context

Desired behavior

Technical details for implementation

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions