[Question]: Gateway does not auto‑fallback after model rate‑limit, causing long blocking retries

### Prerequisites

- [x] I will write this issue in English (see our [Language Policy](https://github.com/code-yeongyu/oh-my-opencode/blob/dev/CONTRIBUTING.md#language-policy))
- [x] I have searched existing issues and discussions
- [x] I have read the [documentation](https://github.com/code-yeongyu/oh-my-opencode#readme) or asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer
- [x] This is a question (not a bug report or feature request)

### Question

When using Oh My OpenCode Gateway, if a model API (OpenAI / Gemini / OpenRouter / NVIDIA, etc.) returns a rate‑limit error such as:

<img width="1455" height="166" alt="Image" src="https://github.com/user-attachments/assets/283a7ae2-dc8c-4529-bd5b-de5f243cff23" />

```
This request would exceed your account's rate limit. Please try again later.
[retrying in 3h 39m attempt #1]
```

Gateway enters a **multi‑hour retry loop** and **does not switch to any other available model**, even if multiple models are configured.

This results in:

- The entire request pipeline being blocked  
- All subsequent requests stuck behind the retry queue  
- No automatic fallback  
- User forced to wait hours until the rate‑limit window resets  

From a user perspective, Gateway becomes effectively “locked” by a single model’s 429.

---

## Expected Behavior

When a model returns **429 / rate limit exceeded**, Gateway should:

- Automatically fallback to the next available model  
- Or allow users to configure a fallback order  
- Or at least avoid multi‑hour retry queues  
- Or provide a config option to disable long retry behavior  

Any of these would prevent the system from becoming unusable.

---

## Actual Behavior

- Gateway receives a 429  
- Gateway enters a long retry window (1–4 hours)  
- Gateway does not switch to other models  
- All requests remain blocked until the retry window ends  

---

## Steps to Reproduce

1. Configure multiple models (e.g., OpenAI + Gemini + Claude)  
2. Use a model with insufficient quota or strict RPM limits  
3. Trigger a rate‑limit error  
4. Gateway outputs:

```
This request would exceed your account's rate limit. Please try again later.
[retrying in 3h 39m attempt #1]
```

5. Gateway does not fallback and becomes stuck in retry mode

---

## Suggestions

- Add **automatic model fallback**  
- Allow users to define a **fallback priority list**  
- Provide a config option to **disable long retry queues**  
- Fail fast on 429 instead of blocking the entire pipeline  

### Context

_No response_

### Doctor Output (Optional)

```shell

```

### Question Category

Configuration

### Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Gateway does not auto‑fallback after model rate‑limit, causing long blocking retries #1420

Prerequisites

Question

Expected Behavior

Actual Behavior

Steps to Reproduce

Suggestions

Context

Doctor Output (Optional)

Question Category

Additional Information

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Question]: Gateway does not auto‑fallback after model rate‑limit, causing long blocking retries #1420

Description

Prerequisites

Question

Expected Behavior

Actual Behavior

Steps to Reproduce

Suggestions

Context

Doctor Output (Optional)

Question Category

Additional Information

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions