-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Problem
TunaCode has no retry strategy. When an LLM API call fails due to a transient error (429 rate limit, 503 service unavailable, network hiccup), the request fails permanently and the user must manually re-send.
The max_retries config key (default: 3) already exists in user configuration and is included in the agent version hash, but it drives no actual retry logic. It is dead config.
Current State
What exists today
| Mechanism | Location | What it does | Limitation |
|---|---|---|---|
| Context overflow retry | core/agents/main.py:285-321 |
Compacts history and retries once on context overflow | Single retry, single error type only |
ToolRetryError |
exceptions.py:334-347 |
Returns error text to the LLM as a failed tool result | Model-directed only; no programmatic backoff, no guardrails |
request_delay |
agent_config.py:52-68 |
Fixed sleep before each API call | Rate limiting only, not a retry mechanism |
GlobalRequestTimeoutError |
main.py:155-170 |
Clears agent cache on timeout | No re-attempt; user must manually retry |
What is missing
-
No exponential backoff on LLM API calls -- A 429 or 503 from the provider kills the request immediately. The single call in
_build_stream_fn(agent_config.py:282) has no retry wrapper. -
No automatic retry on request timeout --
GlobalRequestTimeoutErrorsurfaces to the user with no re-attempt. -
ToolRetryErrorhas no orchestrator-level guardrails -- tinyagent treatsToolRetryErroridentically to any other exception (error tool result). The orchestrator never inspects tool errors to steer the model toward retrying or to cap retry attempts. -
web_fetch429 is informational only -- The tool tells the model "try again later" but does no sleeping or backoff itself (web_fetch.py:213-222).
Code Quality Constraints
- No new core dependencies (stdlib + existing deps only)
max_retriesconfig must drive the actual retry count (it already exists, just wire it up)- Exponential backoff with jitter for API calls
- Fail fast, fail loud -- retries are acceptable but silent swallowing is not; log each retry attempt
- Must not break the existing
ToolRetryErrorcontract (tools raise it, tinyagent returns it as error result)