Skip to content

feat: add retry wrapper with exponential backoff for LLM calls (#710)#732

Open
chaojixinren wants to merge 1 commit intogoogle:mainfrom
chaojixinren:feat/model-retry
Open

feat: add retry wrapper with exponential backoff for LLM calls (#710)#732
chaojixinren wants to merge 1 commit intogoogle:mainfrom
chaojixinren:feat/model-retry

Conversation

@chaojixinren
Copy link
Copy Markdown

Link to Issue or Description of Change

Problem:
LLM API calls can fail with transient errors (429 rate limiting, 503 service unavailable, RESOURCE_EXHAUSTED) causing
unnecessary request failures. There is currently no built-in retry mechanism in the model layer to handle these
recoverable errors gracefully.

Solution:
Add a WithRetry() LLM decorator that wraps any model.LLM implementation with automatic retry logic using
configurable exponential backoff and jitter, aligned with adk-python's retry behavior
(tenacity.wait_exponential_jitter).

Key design decisions:

  • Decorator pattern: WithRetry(llm, cfg) returns a new LLM, composable with any existing implementation
    without modifying it. Unlike adk-python where retry logic is embedded in each LLM class, the Go version uses this
    pattern to avoid duplication across gemini and apigee packages, following idiomatic Go composition.
  • Aligned with adk-python defaults: 5 retries, 1s initial delay, 60s max delay, 2x multiplier, 1s absolute jitter.
  • Retryable errors: HTTP 408/429/500/502/503/504, gRPC UNAVAILABLE/RESOURCE_EXHAUSTED, and network errors
    (connection refused/reset, DNS failure, i/o timeout) — matching adk-python's default_status_codes and
    httpx.NetworkError.
  • Stream-safe: Streaming calls only retry before the first yielded response to prevent duplicate partial content.
  • Context-aware: Backoff sleeps respect context cancellation.
  • Customizable: Users can provide their own IsRetryable function for custom error classification.

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

13 test cases with 18 sub-tests covering:

  • Unary: success without retry, retry on transient error, non-retryable error passthrough, retry exhaustion, context
    cancellation, custom IsRetryable
  • Streaming: success without retry, retry before first data, no retry after partial data
  • Config: default values, backoff growth, backoff max cap
  • Error classification: table-driven tests for all HTTP status codes, gRPC errors, and network errors
  • Name delegation
截屏2026-04-15 22 24 00

Manual End-to-End (E2E) Tests:

// Default config (aligned with adk-python):
llm := model.WithRetry(baseLLM, nil)                                                                                  
                                                          
// Custom config:                       
llm := model.WithRetry(baseLLM, &model.RetryConfig{                                                                   
    MaxRetries:   3,                                                                                                  
    InitialDelay: 2 * time.Second,                                                                                    
    Jitter:       500 * time.Millisecond,                                                                             
})                                                        

The wrapped LLM is a drop-in replacement for any existing model.LLM. When a retryable error occurs, the retry log
output looks like:

WARN retrying LLM call model=gemini-2.5-flash attempt=1 max_retries=5 delay=1.536s error="Error 503, Message: high
demand, Status: UNAVAILABLE"

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Design note — standalone model/retry.go vs embedding in each LLM:

In adk-python, retry logic is embedded in each LLM class (google_llm.py delegates to SDK's HttpRetryOptions,
apigee_llm.py uses tenacity). The Go version extracts it into a standalone decorator because:

  1. Go's model.LLM is a single interface — a decorator wraps any implementation without modifying it.
  2. Avoids duplicating retry logic across model/gemini/ and model/apigee/ packages.
  3. Follows idiomatic Go composition (WithRetry(llm, cfg) is the same pattern as io.LimitReader,
    http.TimeoutHandler, etc.).

New files:

  • model/retry.goRetryConfig, WithRetry(), retryLLM decorator, defaultIsRetryable, sleepCtx
  • model/retry_test.go — 13 test cases with fakeLLM test double

…e#710)

Add WithRetry() LLM decorator that automatically retries transient
errors with configurable exponential backoff and jitter, aligned with
adk-python's retry behavior (tenacity.wait_exponential_jitter).

Defaults (matching adk-python):
- 5 retries, 1s initial delay, 60s max delay, 2x multiplier, 1s jitter
- Retryable: HTTP 408/429/500/502/503/504, gRPC UNAVAILABLE/RESOURCE_EXHAUSTED
- Network errors: connection refused/reset, DNS failure, i/o timeout

Streaming calls only retry before the first yielded response to prevent
duplicate partial content.

Unlike adk-python where retry logic is embedded in each LLM class,
the Go version uses a decorator pattern (WithRetry) that wraps any
model.LLM implementation, avoiding duplication across gemini and apigee
packages and following idiomatic Go composition.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a retry mechanism for LLM calls using exponential backoff with jitter, supporting both unary and streaming response types. The feedback focuses on refining the error detection logic to avoid false positives caused by broad substring matching of status codes. Additionally, the reviewer suggests ensuring that context errors are correctly propagated when a retry delay is interrupted by context cancellation, rather than returning the last encountered LLM error.

Comment thread model/retry.go
Comment thread model/retry.go
Comment thread model/retry.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: retry mechanism for model errors

1 participant