Skip to content

Feature Request: Multi-model fallback per agent with automatic rate limit switching #326

@UAEpro

Description

@UAEpro

Problem

When using agents heavily, models hit rate limits (429 errors). Currently, the only option is to:

  1. Wait for the rate limit to reset
  2. Manually edit ~/.config/opencode/oh-my-opencode.json to switch models

This breaks workflow and wastes time.

Proposed Solution

Allow configuring multiple models per agent with automatic fallback on rate limits or errors.

Example Configuration

{
  "agents": {
    "explore": {
      "models": [
        { "model": "opencode/grok-code", "priority": 1 },
        { "model": "google/gemini-3-flash", "priority": 2 },
        { "model": "anthropic/claude-sonnet-4", "priority": 3 }
      ],
      "fallback_strategy": "sequential",
      "fallback_on": ["rate_limit", "timeout", "error"]
    },
    "librarian": {
      "model": "google/gemini-3-pro"  // backward compatible - single model still works
    }
  }
}

Fallback Strategies

Strategy Behavior
sequential Try models in priority order until one succeeds
round_robin Distribute load across models to prevent hitting limits
random Random selection from available models

Fallback Triggers

Trigger Description
rate_limit HTTP 429 or provider rate limit error
timeout Request timeout exceeded
error Any API error (5xx, connection failed, etc.)
token_limit Context too large for model

Benefits

  • Zero downtime: Agents keep working when primary model hits limits
  • Cost optimization: Use cheaper models as fallbacks
  • Reliability: Multiple providers = redundancy
  • Backward compatible: Single "model": "string" still works

Implementation Hints

Based on existing codebase patterns:

  1. Schema update (assets/oh-my-opencode.schema.json):

    • Change model from string to oneOf: [string, object]
    • Add models array option with priority/strategy
  2. Model resolution (src/cli/config-manager.ts):

    • Already has fallback logic for install-time model selection
    • Could extend to runtime model resolution
  3. Error handling (src/hooks/anthropic-auto-compact/executor.ts):

    • Already tracks RetryState and FallbackState
    • Could trigger model switch instead of just retry/compact
  4. State tracking:

    • Track which models are currently rate-limited
    • Implement cooldown timers per model

Use Case

I run multiple parallel background agents (explore, librarian, oracle) and frequently hit rate limits on primary models. Having automatic fallback would let agents continue working without manual intervention.

Alternatives Considered

  1. Manual config editing — Works but disrupts workflow
  2. Single high-limit model — Expensive, still has limits
  3. External proxy with fallback — Complex, adds latency

Native support in oh-my-opencode would be the cleanest solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions