|
| 1 | +# Failure Handling Strategy |
| 2 | + |
| 3 | +The failure handling strategy provides automatic retry and failover behavior for backend errors such as rate limits (429), connection errors, and authentication failures. It enables the proxy to silently recover from transient errors, improving reliability for agentic workflows. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +When a backend request fails, the failure handling strategy decides whether to: |
| 8 | + |
| 9 | +1. **Wait and Retry** - If the error is recoverable (e.g., rate limit) and the wait time is short, the proxy waits silently and retries the same backend. |
| 10 | + |
| 11 | +2. **Failover Immediately** - If the wait would be too long or an alternative backend is available, the proxy switches to another backend instance that can serve the same model. |
| 12 | + |
| 13 | +3. **Surface the Error** - If no recovery options are available, the error is returned to the client. |
| 14 | + |
| 15 | +This happens transparently to the client, so agentic workflows continue without interruption from transient errors. |
| 16 | + |
| 17 | +## Configuration Options |
| 18 | + |
| 19 | +### CLI Parameters |
| 20 | + |
| 21 | +```bash |
| 22 | +# Disable failure handling entirely |
| 23 | +--disable-failure-handling |
| 24 | + |
| 25 | +# Max seconds to wait before attempting failover (default: 30) |
| 26 | +--max-silent-wait 30 |
| 27 | + |
| 28 | +# Total timeout budget across all failover attempts (default: 90) |
| 29 | +--total-timeout-budget 90 |
| 30 | + |
| 31 | +# Seconds between SSE keepalive comments during waits (default: 8) |
| 32 | +--keepalive-interval 8 |
| 33 | + |
| 34 | +# Maximum backend instances to try in failover chain (default: 5) |
| 35 | +--max-failover-hops 5 |
| 36 | + |
| 37 | +# Minimum retry wait even for sub-second retry-after (default: 1) |
| 38 | +--min-retry-wait 1 |
| 39 | +``` |
| 40 | + |
| 41 | +### Environment Variables |
| 42 | + |
| 43 | +| Variable | Default | Description | |
| 44 | +|----------|---------|-------------| |
| 45 | +| `DISABLE_FAILURE_HANDLING` | `0` | Set to `1` to disable automatic failure handling | |
| 46 | +| `FAILURE_HANDLING_MAX_SILENT_WAIT` | `30.0` | Max seconds to wait before failover | |
| 47 | +| `FAILURE_HANDLING_TOTAL_TIMEOUT_BUDGET` | `90.0` | Total timeout budget for all attempts | |
| 48 | +| `FAILURE_HANDLING_KEEPALIVE_INTERVAL` | `8.0` | SSE keepalive interval during waits | |
| 49 | +| `FAILURE_HANDLING_MAX_FAILOVER_HOPS` | `5` | Maximum backend instances to try | |
| 50 | +| `FAILURE_HANDLING_MIN_RETRY_WAIT` | `1.0` | Minimum retry wait time | |
| 51 | + |
| 52 | +### Configuration File |
| 53 | + |
| 54 | +Add to your `config/config.yaml`: |
| 55 | + |
| 56 | +```yaml |
| 57 | +failure_handling: |
| 58 | + # Master switch to enable/disable failure handling |
| 59 | + enabled: true |
| 60 | + |
| 61 | + # Maximum seconds to wait for retry-after before failover |
| 62 | + # If retry-after <= this value, proxy waits silently |
| 63 | + # If > this value, it attempts failover to another backend |
| 64 | + max_silent_wait: 30.0 |
| 65 | + |
| 66 | + # Maximum total seconds across all failover attempts |
| 67 | + # After this time, errors are surfaced to the client |
| 68 | + total_timeout_budget: 90.0 |
| 69 | + |
| 70 | + # Seconds between SSE keepalive comments during waits |
| 71 | + # Prevents client/connection timeouts during retry periods |
| 72 | + keepalive_interval: 8.0 |
| 73 | + |
| 74 | + # Maximum number of backend instances to try in failover chain |
| 75 | + # Limits failover depth to prevent infinite loops |
| 76 | + max_failover_hops: 5 |
| 77 | + |
| 78 | + # Minimum wait time even for sub-second retry-after |
| 79 | + # Prevents tight retry loops that could overwhelm backends |
| 80 | + min_retry_wait: 1.0 |
| 81 | +``` |
| 82 | +
|
| 83 | +## Parameter Details |
| 84 | +
|
| 85 | +### max_silent_wait |
| 86 | +
|
| 87 | +**Default: 30 seconds** |
| 88 | +
|
| 89 | +This is the threshold that determines whether to wait-and-retry or failover: |
| 90 | +
|
| 91 | +- If `retry-after <= max_silent_wait`: The proxy waits silently and retries the same backend. The client doesn't notice the delay. |
| 92 | +- If `retry-after > max_silent_wait`: The proxy immediately attempts failover to an alternative backend instance. |
| 93 | + |
| 94 | +**Example scenarios:** |
| 95 | +- Backend returns `retry-after: 10s` → Proxy waits 10s and retries (within threshold) |
| 96 | +- Backend returns `retry-after: 60s` → Proxy immediately fails over to another backend (exceeds threshold) |
| 97 | +- Backend returns `retry-after: 600s` → Proxy fails over if possible, otherwise surfaces error |
| 98 | + |
| 99 | +Lower values (e.g., 10-15s) provide faster failover but may cause unnecessary backend switching. Higher values (e.g., 45-60s) are more patient but may cause longer delays for the client. |
| 100 | + |
| 101 | +### total_timeout_budget |
| 102 | + |
| 103 | +**Default: 90 seconds** |
| 104 | + |
| 105 | +The maximum total time the proxy will spend attempting recovery before surfacing an error to the client. This includes: |
| 106 | +- Time spent waiting for retry-after delays |
| 107 | +- Time spent on failover attempts |
| 108 | +- Time spent on actual backend requests |
| 109 | + |
| 110 | +Once this budget is exhausted, any subsequent errors are immediately surfaced to the client. |
| 111 | + |
| 112 | +### keepalive_interval |
| 113 | + |
| 114 | +**Default: 8 seconds** |
| 115 | + |
| 116 | +During wait periods (e.g., waiting for a retry-after delay), the proxy emits SSE keepalive comments to prevent client/connection timeouts. This is especially important for streaming responses. |
| 117 | + |
| 118 | +The keepalive comments look like: |
| 119 | +``` |
| 120 | +: keepalive |
| 121 | +``` |
| 122 | + |
| 123 | +Clients should ignore these as they're standard SSE comments. |
| 124 | + |
| 125 | +### max_failover_hops |
| 126 | + |
| 127 | +**Default: 5** |
| 128 | + |
| 129 | +The maximum number of different backend instances to try before giving up. This prevents infinite failover loops when all backends are experiencing issues. |
| 130 | + |
| 131 | +For example, with `max_failover_hops: 3`: |
| 132 | +1. Try `openai-1` → fails |
| 133 | +2. Try `openai-2` → fails |
| 134 | +3. Try `openai-3` → fails |
| 135 | +4. Surface error to client (max hops reached) |
| 136 | + |
| 137 | +### min_retry_wait |
| 138 | + |
| 139 | +**Default: 1 second** |
| 140 | + |
| 141 | +The minimum wait time enforced even when a backend returns a very short `retry-after` value (e.g., 0.1 seconds). This prevents tight retry loops that could: |
| 142 | +- Overwhelm the backend |
| 143 | +- Consume excessive CPU |
| 144 | +- Create retry storms |
| 145 | + |
| 146 | +## Behavior by Error Type |
| 147 | + |
| 148 | +### Recoverable Errors (may retry or failover) |
| 149 | + |
| 150 | +- **429 Too Many Requests** - Rate limit errors. Uses `retry-after` header if available. |
| 151 | +- **503 Service Unavailable** - Temporary unavailability. Short default wait applied. |
| 152 | +- **Connection Errors** - Network issues. Short wait then retry/failover. |
| 153 | +- **Timeout Errors** - Request timeouts. Immediate failover preferred. |
| 154 | + |
| 155 | +### Unrecoverable Errors (failover only, then surface) |
| 156 | + |
| 157 | +- **401 Unauthorized** - Authentication failure. Immediate failover, no retry. |
| 158 | +- **403 Forbidden** - Authorization failure. Immediate failover, no retry. |
| 159 | +- **500 Internal Server Error** - Backend error. Immediate failover, no retry. |
| 160 | +- **400 Bad Request** - Invalid request. Surfaced immediately (client error). |
| 161 | + |
| 162 | +### Content-Started Errors (always surface) |
| 163 | + |
| 164 | +If the backend has already started sending content to the client (e.g., streaming has begun), the error is always surfaced immediately. Partial responses cannot be transparently recovered. |
| 165 | + |
| 166 | +## Streaming Behavior |
| 167 | + |
| 168 | +During streaming responses, the failure handling strategy behaves slightly differently: |
| 169 | + |
| 170 | +1. **Before content starts**: Full retry/failover capability. Client sees no error. |
| 171 | +2. **After content starts**: No recovery possible. Error is surfaced to client. |
| 172 | + |
| 173 | +Keepalive comments are emitted during wait periods to prevent streaming timeouts: |
| 174 | +``` |
| 175 | +: keepalive |
| 176 | +: retrying in 5s |
| 177 | +: retrying now |
| 178 | +``` |
| 179 | + |
| 180 | +## Monitoring |
| 181 | + |
| 182 | +The failure handling strategy logs its decisions at INFO level: |
| 183 | + |
| 184 | +``` |
| 185 | +INFO Failure strategy: waiting 10.0s before retrying backend-1/gpt-4o |
| 186 | +INFO Failure strategy: failing over from backend-1 to backend-2 for model gpt-4o |
| 187 | +``` |
| 188 | + |
| 189 | +## Examples |
| 190 | + |
| 191 | +### Conservative Settings (More Patient) |
| 192 | + |
| 193 | +For workflows that can tolerate longer delays but want maximum retry attempts: |
| 194 | + |
| 195 | +```yaml |
| 196 | +failure_handling: |
| 197 | + enabled: true |
| 198 | + max_silent_wait: 60.0 |
| 199 | + total_timeout_budget: 180.0 |
| 200 | + max_failover_hops: 10 |
| 201 | + min_retry_wait: 2.0 |
| 202 | +``` |
| 203 | + |
| 204 | +### Aggressive Settings (Fast Failover) |
| 205 | + |
| 206 | +For latency-sensitive workflows that prefer quick failover: |
| 207 | + |
| 208 | +```yaml |
| 209 | +failure_handling: |
| 210 | + enabled: true |
| 211 | + max_silent_wait: 10.0 |
| 212 | + total_timeout_budget: 45.0 |
| 213 | + max_failover_hops: 3 |
| 214 | + min_retry_wait: 0.5 |
| 215 | +``` |
| 216 | + |
| 217 | +### Disable for Debugging |
| 218 | + |
| 219 | +When debugging backend issues, you may want to see raw errors: |
| 220 | + |
| 221 | +```bash |
| 222 | +--disable-failure-handling |
| 223 | +``` |
| 224 | + |
| 225 | +Or via environment: |
| 226 | +```bash |
| 227 | +export DISABLE_FAILURE_HANDLING=1 |
| 228 | +``` |
| 229 | + |
| 230 | +## Related Features |
| 231 | + |
| 232 | +- [Health Checks](health-checks.md) - Proactive backend health monitoring |
| 233 | +- [Backends Overview](../backends/overview.md) - Available backend configurations |
| 234 | +- [Troubleshooting](../debugging/troubleshooting.md) - Debugging common issues |
| 235 | + |
0 commit comments