feat: add retry wrapper with exponential backoff for LLM calls (#710) by chaojixinren · Pull Request #732 · google/adk-go

chaojixinren · 2026-04-15T15:01:30Z

Link to Issue or Description of Change

Closes: Feature request: retry mechanism for model errors #710

Problem:
LLM API calls can fail with transient errors (429 rate limiting, 503 service unavailable, RESOURCE_EXHAUSTED) causing
unnecessary request failures. There is currently no built-in retry mechanism in the model layer to handle these
recoverable errors gracefully.

Solution:
Add a WithRetry() LLM decorator that wraps any model.LLM implementation with automatic retry logic using
configurable exponential backoff and jitter, aligned with adk-python's retry behavior
(tenacity.wait_exponential_jitter).

Key design decisions:

Decorator pattern: WithRetry(llm, cfg) returns a new LLM, composable with any existing implementation
without modifying it. Unlike adk-python where retry logic is embedded in each LLM class, the Go version uses this
pattern to avoid duplication across gemini and apigee packages, following idiomatic Go composition.
Aligned with adk-python defaults: 5 retries, 1s initial delay, 60s max delay, 2x multiplier, 1s absolute jitter.
Retryable errors: HTTP 408/429/500/502/503/504, gRPC UNAVAILABLE/RESOURCE_EXHAUSTED, and network errors
(connection refused/reset, DNS failure, i/o timeout) — matching adk-python's default_status_codes and
httpx.NetworkError.
Stream-safe: Streaming calls only retry before the first yielded response to prevent duplicate partial content.
Context-aware: Backoff sleeps respect context cancellation.
Customizable: Users can provide their own IsRetryable function for custom error classification.

Testing Plan

Unit Tests:

I have added or updated unit tests for my change.
All unit tests pass locally.

13 test cases with 18 sub-tests covering:

Unary: success without retry, retry on transient error, non-retryable error passthrough, retry exhaustion, context
cancellation, custom IsRetryable
Streaming: success without retry, retry before first data, no retry after partial data
Config: default values, backoff growth, backoff max cap
Error classification: table-driven tests for all HTTP status codes, gRPC errors, and network errors
Name delegation

Manual End-to-End (E2E) Tests:

// Default config (aligned with adk-python):
llm := model.WithRetry(baseLLM, nil)                                                                                  
                                                          
// Custom config:                       
llm := model.WithRetry(baseLLM, &model.RetryConfig{                                                                   
    MaxRetries:   3,                                                                                                  
    InitialDelay: 2 * time.Second,                                                                                    
    Jitter:       500 * time.Millisecond,                                                                             
})

The wrapped LLM is a drop-in replacement for any existing model.LLM. When a retryable error occurs, the retry log
output looks like:

WARN retrying LLM call model=gemini-2.5-flash attempt=1 max_retries=5 delay=1.536s error="Error 503, Message: high
demand, Status: UNAVAILABLE"

Checklist

I have read the CONTRIBUTING.md document.
I have performed a self-review of my own code.
I have commented my code, particularly in hard-to-understand areas.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.
I have manually tested my changes end-to-end.
Any dependent changes have been merged and published in downstream modules.

Additional context

Design note — standalone model/retry.go vs embedding in each LLM:

In adk-python, retry logic is embedded in each LLM class (google_llm.py delegates to SDK's HttpRetryOptions,
apigee_llm.py uses tenacity). The Go version extracts it into a standalone decorator because:

Go's model.LLM is a single interface — a decorator wraps any implementation without modifying it.
Avoids duplicating retry logic across model/gemini/ and model/apigee/ packages.
Follows idiomatic Go composition (WithRetry(llm, cfg) is the same pattern as io.LimitReader,
http.TimeoutHandler, etc.).

New files:

model/retry.go — RetryConfig, WithRetry(), retryLLM decorator, defaultIsRetryable, sleepCtx
model/retry_test.go — 13 test cases with fakeLLM test double

…e#710) Add WithRetry() LLM decorator that automatically retries transient errors with configurable exponential backoff and jitter, aligned with adk-python's retry behavior (tenacity.wait_exponential_jitter). Defaults (matching adk-python): - 5 retries, 1s initial delay, 60s max delay, 2x multiplier, 1s jitter - Retryable: HTTP 408/429/500/502/503/504, gRPC UNAVAILABLE/RESOURCE_EXHAUSTED - Network errors: connection refused/reset, DNS failure, i/o timeout Streaming calls only retry before the first yielded response to prevent duplicate partial content. Unlike adk-python where retry logic is embedded in each LLM class, the Go version uses a decorator pattern (WithRetry) that wraps any model.LLM implementation, avoiding duplication across gemini and apigee packages and following idiomatic Go composition.

gemini-code-assist

Code Review

This pull request introduces a retry mechanism for LLM calls using exponential backoff with jitter, supporting both unary and streaming response types. The feedback focuses on refining the error detection logic to avoid false positives caused by broad substring matching of status codes. Additionally, the reviewer suggests ensuring that context errors are correctly propagated when a retry delay is interrupted by context cancellation, rather than returning the last encountered LLM error.

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread model/retry.go

Comment thread model/retry.go

Comment thread model/retry.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add retry wrapper with exponential backoff for LLM calls (#710)#732

feat: add retry wrapper with exponential backoff for LLM calls (#710)#732
chaojixinren wants to merge 1 commit intogoogle:mainfrom
chaojixinren:feat/model-retry

chaojixinren commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chaojixinren commented Apr 15, 2026

Link to Issue or Description of Change

Testing Plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant