Skip to content

Commit 65e3ef5

Browse files
author
Mateusz
committed
feat: Implement failure handling configuration and CLI parameters
1 parent 61674d1 commit 65e3ef5

File tree

17 files changed

+1918
-924
lines changed

17 files changed

+1918
-924
lines changed

config/schemas/app_config.schema.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,16 @@ properties:
288288
method: { type: string, enum: [GET, HEAD] }
289289
path: { type: string }
290290
accept_any_response: { type: boolean }
291+
failure_handling:
292+
type: object
293+
additionalProperties: false
294+
properties:
295+
enabled: { type: boolean }
296+
max_silent_wait: { type: number, minimum: 0 }
297+
total_timeout_budget: { type: number, minimum: 0 }
298+
keepalive_interval: { type: number, minimum: 1 }
299+
max_failover_hops: { type: integer, minimum: 1 }
300+
min_retry_wait: { type: number, minimum: 0.1 }
291301
routing:
292302
type: object
293303
additionalProperties: false

data/cli_flag_snapshot.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
--disable-dangerous-git-commands-protection
3030
--disable-double-ampersand-fixes-for-windows
3131
--disable-edit-precision
32+
--disable-failure-handling
3233
--disable-gemini-oauth-fallback
3334
--disable-hybrid-backend
3435
--disable-interactive-commands
@@ -80,13 +81,16 @@
8081
--identity-title
8182
--identity-url
8283
--identity-user-agent
84+
--keepalive-interval
8385
--llm-assessment-confidence-threshold
8486
--llm-assessment-history-window
8587
--llm-assessment-model
8688
--llm-assessment-turn-threshold
8789
--log
8890
--log-colors
8991
--log-level
92+
--max-failover-hops
93+
--max-silent-wait
9094
--memory-available
9195
--memory-context-model
9296
--memory-context-prompt
@@ -103,6 +107,7 @@
103107
--memory-single-user-mode
104108
--memory-summary-model
105109
--memory-summary-prompt
110+
--min-retry-wait
106111
--model-alias
107112
--no-log-colors
108113
--no-test-execution-reminder-enabled
@@ -128,6 +133,7 @@
128133
--test-execution-reminder-enabled
129134
--thinking-budget
130135
--timeout
136+
--total-timeout-budget
131137
--trusted-ip
132138
--use-angel-model
133139
--zai-api-key

docs/user_guide/cli-parameters.md

Lines changed: 383 additions & 370 deletions
Large diffs are not rendered by default.
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Failure Handling Strategy
2+
3+
The failure handling strategy provides automatic retry and failover behavior for backend errors such as rate limits (429), connection errors, and authentication failures. It enables the proxy to silently recover from transient errors, improving reliability for agentic workflows.
4+
5+
## Overview
6+
7+
When a backend request fails, the failure handling strategy decides whether to:
8+
9+
1. **Wait and Retry** - If the error is recoverable (e.g., rate limit) and the wait time is short, the proxy waits silently and retries the same backend.
10+
11+
2. **Failover Immediately** - If the wait would be too long or an alternative backend is available, the proxy switches to another backend instance that can serve the same model.
12+
13+
3. **Surface the Error** - If no recovery options are available, the error is returned to the client.
14+
15+
This happens transparently to the client, so agentic workflows continue without interruption from transient errors.
16+
17+
## Configuration Options
18+
19+
### CLI Parameters
20+
21+
```bash
22+
# Disable failure handling entirely
23+
--disable-failure-handling
24+
25+
# Max seconds to wait before attempting failover (default: 30)
26+
--max-silent-wait 30
27+
28+
# Total timeout budget across all failover attempts (default: 90)
29+
--total-timeout-budget 90
30+
31+
# Seconds between SSE keepalive comments during waits (default: 8)
32+
--keepalive-interval 8
33+
34+
# Maximum backend instances to try in failover chain (default: 5)
35+
--max-failover-hops 5
36+
37+
# Minimum retry wait even for sub-second retry-after (default: 1)
38+
--min-retry-wait 1
39+
```
40+
41+
### Environment Variables
42+
43+
| Variable | Default | Description |
44+
|----------|---------|-------------|
45+
| `DISABLE_FAILURE_HANDLING` | `0` | Set to `1` to disable automatic failure handling |
46+
| `FAILURE_HANDLING_MAX_SILENT_WAIT` | `30.0` | Max seconds to wait before failover |
47+
| `FAILURE_HANDLING_TOTAL_TIMEOUT_BUDGET` | `90.0` | Total timeout budget for all attempts |
48+
| `FAILURE_HANDLING_KEEPALIVE_INTERVAL` | `8.0` | SSE keepalive interval during waits |
49+
| `FAILURE_HANDLING_MAX_FAILOVER_HOPS` | `5` | Maximum backend instances to try |
50+
| `FAILURE_HANDLING_MIN_RETRY_WAIT` | `1.0` | Minimum retry wait time |
51+
52+
### Configuration File
53+
54+
Add to your `config/config.yaml`:
55+
56+
```yaml
57+
failure_handling:
58+
# Master switch to enable/disable failure handling
59+
enabled: true
60+
61+
# Maximum seconds to wait for retry-after before failover
62+
# If retry-after <= this value, proxy waits silently
63+
# If > this value, it attempts failover to another backend
64+
max_silent_wait: 30.0
65+
66+
# Maximum total seconds across all failover attempts
67+
# After this time, errors are surfaced to the client
68+
total_timeout_budget: 90.0
69+
70+
# Seconds between SSE keepalive comments during waits
71+
# Prevents client/connection timeouts during retry periods
72+
keepalive_interval: 8.0
73+
74+
# Maximum number of backend instances to try in failover chain
75+
# Limits failover depth to prevent infinite loops
76+
max_failover_hops: 5
77+
78+
# Minimum wait time even for sub-second retry-after
79+
# Prevents tight retry loops that could overwhelm backends
80+
min_retry_wait: 1.0
81+
```
82+
83+
## Parameter Details
84+
85+
### max_silent_wait
86+
87+
**Default: 30 seconds**
88+
89+
This is the threshold that determines whether to wait-and-retry or failover:
90+
91+
- If `retry-after <= max_silent_wait`: The proxy waits silently and retries the same backend. The client doesn't notice the delay.
92+
- If `retry-after > max_silent_wait`: The proxy immediately attempts failover to an alternative backend instance.
93+
94+
**Example scenarios:**
95+
- Backend returns `retry-after: 10s` → Proxy waits 10s and retries (within threshold)
96+
- Backend returns `retry-after: 60s` → Proxy immediately fails over to another backend (exceeds threshold)
97+
- Backend returns `retry-after: 600s` → Proxy fails over if possible, otherwise surfaces error
98+
99+
Lower values (e.g., 10-15s) provide faster failover but may cause unnecessary backend switching. Higher values (e.g., 45-60s) are more patient but may cause longer delays for the client.
100+
101+
### total_timeout_budget
102+
103+
**Default: 90 seconds**
104+
105+
The maximum total time the proxy will spend attempting recovery before surfacing an error to the client. This includes:
106+
- Time spent waiting for retry-after delays
107+
- Time spent on failover attempts
108+
- Time spent on actual backend requests
109+
110+
Once this budget is exhausted, any subsequent errors are immediately surfaced to the client.
111+
112+
### keepalive_interval
113+
114+
**Default: 8 seconds**
115+
116+
During wait periods (e.g., waiting for a retry-after delay), the proxy emits SSE keepalive comments to prevent client/connection timeouts. This is especially important for streaming responses.
117+
118+
The keepalive comments look like:
119+
```
120+
: keepalive
121+
```
122+
123+
Clients should ignore these as they're standard SSE comments.
124+
125+
### max_failover_hops
126+
127+
**Default: 5**
128+
129+
The maximum number of different backend instances to try before giving up. This prevents infinite failover loops when all backends are experiencing issues.
130+
131+
For example, with `max_failover_hops: 3`:
132+
1. Try `openai-1` → fails
133+
2. Try `openai-2` → fails
134+
3. Try `openai-3` → fails
135+
4. Surface error to client (max hops reached)
136+
137+
### min_retry_wait
138+
139+
**Default: 1 second**
140+
141+
The minimum wait time enforced even when a backend returns a very short `retry-after` value (e.g., 0.1 seconds). This prevents tight retry loops that could:
142+
- Overwhelm the backend
143+
- Consume excessive CPU
144+
- Create retry storms
145+
146+
## Behavior by Error Type
147+
148+
### Recoverable Errors (may retry or failover)
149+
150+
- **429 Too Many Requests** - Rate limit errors. Uses `retry-after` header if available.
151+
- **503 Service Unavailable** - Temporary unavailability. Short default wait applied.
152+
- **Connection Errors** - Network issues. Short wait then retry/failover.
153+
- **Timeout Errors** - Request timeouts. Immediate failover preferred.
154+
155+
### Unrecoverable Errors (failover only, then surface)
156+
157+
- **401 Unauthorized** - Authentication failure. Immediate failover, no retry.
158+
- **403 Forbidden** - Authorization failure. Immediate failover, no retry.
159+
- **500 Internal Server Error** - Backend error. Immediate failover, no retry.
160+
- **400 Bad Request** - Invalid request. Surfaced immediately (client error).
161+
162+
### Content-Started Errors (always surface)
163+
164+
If the backend has already started sending content to the client (e.g., streaming has begun), the error is always surfaced immediately. Partial responses cannot be transparently recovered.
165+
166+
## Streaming Behavior
167+
168+
During streaming responses, the failure handling strategy behaves slightly differently:
169+
170+
1. **Before content starts**: Full retry/failover capability. Client sees no error.
171+
2. **After content starts**: No recovery possible. Error is surfaced to client.
172+
173+
Keepalive comments are emitted during wait periods to prevent streaming timeouts:
174+
```
175+
: keepalive
176+
: retrying in 5s
177+
: retrying now
178+
```
179+
180+
## Monitoring
181+
182+
The failure handling strategy logs its decisions at INFO level:
183+
184+
```
185+
INFO Failure strategy: waiting 10.0s before retrying backend-1/gpt-4o
186+
INFO Failure strategy: failing over from backend-1 to backend-2 for model gpt-4o
187+
```
188+
189+
## Examples
190+
191+
### Conservative Settings (More Patient)
192+
193+
For workflows that can tolerate longer delays but want maximum retry attempts:
194+
195+
```yaml
196+
failure_handling:
197+
enabled: true
198+
max_silent_wait: 60.0
199+
total_timeout_budget: 180.0
200+
max_failover_hops: 10
201+
min_retry_wait: 2.0
202+
```
203+
204+
### Aggressive Settings (Fast Failover)
205+
206+
For latency-sensitive workflows that prefer quick failover:
207+
208+
```yaml
209+
failure_handling:
210+
enabled: true
211+
max_silent_wait: 10.0
212+
total_timeout_budget: 45.0
213+
max_failover_hops: 3
214+
min_retry_wait: 0.5
215+
```
216+
217+
### Disable for Debugging
218+
219+
When debugging backend issues, you may want to see raw errors:
220+
221+
```bash
222+
--disable-failure-handling
223+
```
224+
225+
Or via environment:
226+
```bash
227+
export DISABLE_FAILURE_HANDLING=1
228+
```
229+
230+
## Related Features
231+
232+
- [Health Checks](health-checks.md) - Proactive backend health monitoring
233+
- [Backends Overview](../backends/overview.md) - Available backend configurations
234+
- [Troubleshooting](../debugging/troubleshooting.md) - Debugging common issues
235+

0 commit comments

Comments
 (0)