Skip to content

[ENHANCEMENT] Honor Gemini retryDelay; clarify rate-limit vs quota #8012

@hannesrudolph

Description

@hannesrudolph

Type

Enhancement

Problem

When Google Gemini returns a 429 response, it often includes a recommended wait time. The app's current retry behavior may not fully respect that guidance, leading to avoidable errors and confusing “Too Many Requests” messages.

Context

This affects users on free and paid Gemini tiers when running heavier or bursty workflows that hit tokens-per-minute limits. Gemini responses can include a recommended delay (for example, retryDelay: "59s"). The app should wait for that duration (with a small safety buffer) and clearly indicate that it's a temporary rate limit. If the response indicates that the daily/monthly quota is exhausted, the app should stop retrying and show a clear, actionable message.
Reference with a full example error (includes RetryInfo.retryDelay): #6680 (comment)

Desired behavior

  • Automatically pause and retry based on the provider’s suggested retry delay, with a small buffer.
  • Show a simple notice that we’re waiting due to rate limiting, ideally with an approximate countdown.
  • Distinguish between temporary rate limiting and genuine quota exhaustion, using different messages and behavior.
  • Fall back to a reasonable backoff strategy when no suggested delay is provided.
  • Keep the flow unobtrusive and avoid noisy or repeated error popups.

Technical details for implementation

  • Error shape to support:
    • HTTP 429 with status "Too Many Requests" and a top-level error.status of "RESOURCE_EXHAUSTED".
    • The error payload’s details array may include:
      • type.googleapis.com/google.rpc.RetryInfo with a retryDelay string like "59s" (google.protobuf.Duration format).
      • type.googleapis.com/google.rpc.QuotaFailure with fields such as quotaMetric, quotaId, quotaDimensions, and quotaValue indicating the limit that was exceeded (e.g., tokens per model per minute).
  • Provider-layer expectations:
    • Preserve structured error information so the retry system can read it: HTTP status (429), top-level error.status, and the details array including any RetryInfo and QuotaFailure entries. Avoid flattening these into a plain string that discards fields.
  • Retry behavior:
    • If RetryInfo.retryDelay is present, parse the duration, add a small buffer (e.g., +1–2s), and use that as the retry wait.
    • When both a local backoff and a provider-suggested delay exist, wait for the greater of the two.
    • If QuotaFailure is present and the message indicates true quota exhaustion (e.g., daily/monthly limit), stop retrying and show a clear message with next steps.
    • If no RetryInfo is present, fall back to exponential backoff with a reasonable cap.
  • User messaging:
    • For temporary rate limiting (RetryInfo present): show a lightweight notice that we’re waiting (with a short countdown if feasible).
    • For quota exhaustion (QuotaFailure with no RetryInfo or explicit quota-exceeded message): do not retry; surface a clear, actionable message and link to the provider’s rate-limit docs.

Acceptance criteria

  • Given a 429 with RetryInfo.retryDelay (e.g., "59s"), the app waits approximately that long (+ small buffer) before retrying, without spamming error popups, and resumes automatically.
  • Given a 429 indicating quota exhaustion with QuotaFailure and no RetryInfo, the app does not retry and shows a clear message explaining that the quota was exceeded and how to proceed.
  • Given a 429 without RetryInfo, the app uses exponential backoff (capped) and communicates that it’s retrying due to rate limiting.
  • In all cases, the user-facing copy is concise and non-technical, and links to the provider’s rate limit documentation for context (e.g., https://ai.google.dev/gemini-api/docs/rate-limits).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Issue [Unassigned]

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions