feat: add retry wrapper with exponential backoff for LLM calls (#710)#732
Open
chaojixinren wants to merge 1 commit intogoogle:mainfrom
Open
feat: add retry wrapper with exponential backoff for LLM calls (#710)#732chaojixinren wants to merge 1 commit intogoogle:mainfrom
chaojixinren wants to merge 1 commit intogoogle:mainfrom
Conversation
…e#710) Add WithRetry() LLM decorator that automatically retries transient errors with configurable exponential backoff and jitter, aligned with adk-python's retry behavior (tenacity.wait_exponential_jitter). Defaults (matching adk-python): - 5 retries, 1s initial delay, 60s max delay, 2x multiplier, 1s jitter - Retryable: HTTP 408/429/500/502/503/504, gRPC UNAVAILABLE/RESOURCE_EXHAUSTED - Network errors: connection refused/reset, DNS failure, i/o timeout Streaming calls only retry before the first yielded response to prevent duplicate partial content. Unlike adk-python where retry logic is embedded in each LLM class, the Go version uses a decorator pattern (WithRetry) that wraps any model.LLM implementation, avoiding duplication across gemini and apigee packages and following idiomatic Go composition.
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a retry mechanism for LLM calls using exponential backoff with jitter, supporting both unary and streaming response types. The feedback focuses on refining the error detection logic to avoid false positives caused by broad substring matching of status codes. Additionally, the reviewer suggests ensuring that context errors are correctly propagated when a retry delay is interrupted by context cancellation, rather than returning the last encountered LLM error.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Link to Issue or Description of Change
Problem:
LLM API calls can fail with transient errors (429 rate limiting, 503 service unavailable, RESOURCE_EXHAUSTED) causing
unnecessary request failures. There is currently no built-in retry mechanism in the model layer to handle these
recoverable errors gracefully.
Solution:
Add a
WithRetry()LLM decorator that wraps anymodel.LLMimplementation with automatic retry logic usingconfigurable exponential backoff and jitter, aligned with adk-python's retry behavior
(
tenacity.wait_exponential_jitter).Key design decisions:
WithRetry(llm, cfg)returns a newLLM, composable with any existing implementationwithout modifying it. Unlike adk-python where retry logic is embedded in each LLM class, the Go version uses this
pattern to avoid duplication across
geminiandapigeepackages, following idiomatic Go composition.(connection refused/reset, DNS failure, i/o timeout) — matching adk-python's
default_status_codesandhttpx.NetworkError.IsRetryablefunction for custom error classification.Testing Plan
Unit Tests:
13 test cases with 18 sub-tests covering:
cancellation, custom
IsRetryableManual End-to-End (E2E) Tests:
The wrapped LLM is a drop-in replacement for any existing model.LLM. When a retryable error occurs, the retry log
output looks like:
WARN retrying LLM call model=gemini-2.5-flash attempt=1 max_retries=5 delay=1.536s error="Error 503, Message: high
demand, Status: UNAVAILABLE"
Checklist
Additional context
Design note — standalone
model/retry.govs embedding in each LLM:In adk-python, retry logic is embedded in each LLM class (
google_llm.pydelegates to SDK'sHttpRetryOptions,apigee_llm.pyuses tenacity). The Go version extracts it into a standalone decorator because:model.LLMis a single interface — a decorator wraps any implementation without modifying it.model/gemini/andmodel/apigee/packages.WithRetry(llm, cfg)is the same pattern asio.LimitReader,http.TimeoutHandler, etc.).New files:
model/retry.go—RetryConfig,WithRetry(),retryLLMdecorator,defaultIsRetryable,sleepCtxmodel/retry_test.go— 13 test cases withfakeLLMtest double