Skip to content

feature: retry strategy for LLM API calls and tool execution #393

@tunahorse

Description

@tunahorse

Problem

TunaCode has no retry strategy. When an LLM API call fails due to a transient error (429 rate limit, 503 service unavailable, network hiccup), the request fails permanently and the user must manually re-send.

The max_retries config key (default: 3) already exists in user configuration and is included in the agent version hash, but it drives no actual retry logic. It is dead config.

Current State

What exists today

Mechanism Location What it does Limitation
Context overflow retry core/agents/main.py:285-321 Compacts history and retries once on context overflow Single retry, single error type only
ToolRetryError exceptions.py:334-347 Returns error text to the LLM as a failed tool result Model-directed only; no programmatic backoff, no guardrails
request_delay agent_config.py:52-68 Fixed sleep before each API call Rate limiting only, not a retry mechanism
GlobalRequestTimeoutError main.py:155-170 Clears agent cache on timeout No re-attempt; user must manually retry

What is missing

  1. No exponential backoff on LLM API calls -- A 429 or 503 from the provider kills the request immediately. The single call in _build_stream_fn (agent_config.py:282) has no retry wrapper.

  2. No automatic retry on request timeout -- GlobalRequestTimeoutError surfaces to the user with no re-attempt.

  3. ToolRetryError has no orchestrator-level guardrails -- tinyagent treats ToolRetryError identically to any other exception (error tool result). The orchestrator never inspects tool errors to steer the model toward retrying or to cap retry attempts.

  4. web_fetch 429 is informational only -- The tool tells the model "try again later" but does no sleeping or backoff itself (web_fetch.py:213-222).

Code Quality Constraints

  • No new core dependencies (stdlib + existing deps only)
  • max_retries config must drive the actual retry count (it already exists, just wire it up)
  • Exponential backoff with jitter for API calls
  • Fail fast, fail loud -- retries are acceptable but silent swallowing is not; log each retry attempt
  • Must not break the existing ToolRetryError contract (tools raise it, tinyagent returns it as error result)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthardVery challenging issues requiring significant architectural changes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions