Skip to content

Commit f4159f6

Browse files
kodster28kathaylmarciocloudflare
authored
AI gateway -> request handling (#19619)
* partial * mostly cleaned up. Still need to add more headeres to glossary * Update fallbacks.mdx "in the following example" paragraph was duplicated, so deleting * Update request-handling.mdx slight updates re: -what to use when using universal -wording * Update ai-gateway.yaml update date * Update request-handling.mdx fix word * remove random file * remove file * Add headers * Added example * Apply suggestions from code review Co-authored-by: marciocloudflare <[email protected]> * Update request-handling.mdx Fix typo * fix highlight --------- Co-authored-by: Kathy <[email protected]> Co-authored-by: marciocloudflare <[email protected]>
1 parent c672c09 commit f4159f6

File tree

4 files changed

+233
-3
lines changed

4 files changed

+233
-3
lines changed

src/content/changelogs/ai-gateway.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ productLink: "/ai-gateway/"
55
productArea: Developer platform
66
productAreaLink: /workers/platform/changelog/platform/
77
entries:
8+
- publish_date: "2025-02-06"
9+
title: Added request handling
10+
description: |-
11+
* Added [request handling options](/ai-gateway/request-handling/) to help manage AI provider interactions effectively, ensuring your applications remain responsive and reliable.
12+
813
- publish_date: "2025-02-05"
914
title: New AI Gateway providers
1015
description: |-

src/content/docs/ai-gateway/configuration/fallbacks.mdx

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,15 @@ import { Render } from "~/components";
99

1010
Specify model or provider fallbacks with your [Universal endpoint](/ai-gateway/providers/universal/) to handle request failures and ensure reliability.
1111

12-
Fallbacks are currently triggered only when a request encounters an error. We are working to expand fallback functionality to include time-based triggers, which will allow requests that exceed a predefined response time to timeout and fallback.
12+
Cloudflare can trigger your fallback provider in response to [request errors](#request-failures) or [predetermined request timeouts](#request-timeouts). The [response header `cf-aig-step`](#response-headercf-aig-step) indicates which step successfully processed the request.
1313

14-
## Example
14+
## Request failures
1515

16-
In the following example, a request first goes to the [Workers AI](/workers-ai/) Inference API. If the request fails, it falls back to OpenAI. The response header `cf-aig-step` indicates which provider successfully processed the request.
16+
By default, Cloudflare triggers your fallback if a model request returns an error.
17+
18+
### Example
19+
20+
In the following example, a request first goes to the [Workers AI](/workers-ai/) Inference API. If the request fails, it falls back to OpenAI. The response header `cf-aig-step` indicates which provider successfully processed the request.
1721

1822
1. Sends a request to Workers AI Inference API.
1923
2. If that request fails, proceeds to OpenAI.
Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
---
2+
pcx_content_type: configuration
3+
title: Request handling
4+
sidebar:
5+
order: 4
6+
---
7+
8+
import { Render, Aside } from "~/components";
9+
10+
Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.
11+
12+
## Request timeouts
13+
14+
A request timeout allows you to trigger fallbacks or a retry if a provider takes too long to respond.
15+
16+
These timeouts help:
17+
18+
- Improve user experience, by preventing users from waiting too long for a response
19+
- Proactively handle errors, by detecting unresponsive providers and triggering a fallback option
20+
21+
Request timeouts can be set on a Universal Endpoint or directly on a request to any provider.
22+
23+
### Definitions
24+
25+
A timeout is set in milliseconds. Additionally, the timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe - such as when streaming a response - your gateway will wait for the response.
26+
27+
### Configuration
28+
29+
#### Universal Endpoint
30+
31+
If set on a [Universal Endpoint](/ai-gateway/providers/universal/), a request timeout specifies the timeout duration for requests and triggers a fallback.
32+
33+
For a Universal Endpoint, configure the timeout value by setting a `requestTimeout` property within the provider-specific `config` object. Each provider can have a different `requestTimeout` value for granular customization.
34+
35+
```bash title="Provider-level config" {11-13} collapse={15-48}
36+
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
37+
--header 'Content-Type: application/json' \
38+
--data '[
39+
{
40+
"provider": "workers-ai",
41+
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
42+
"headers": {
43+
"Authorization": "Bearer {cloudflare_token}",
44+
"Content-Type": "application/json"
45+
},
46+
"config": {
47+
"requestTimeout": 1000
48+
},
49+
"query": {
50+
"messages": [
51+
{
52+
"role": "system",
53+
"content": "You are a friendly assistant"
54+
},
55+
{
56+
"role": "user",
57+
"content": "What is Cloudflare?"
58+
}
59+
]
60+
}
61+
},
62+
{
63+
"provider": "workers-ai",
64+
"endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
65+
"headers": {
66+
"Authorization": "Bearer {cloudflare_token}",
67+
"Content-Type": "application/json"
68+
},
69+
"query": {
70+
"messages": [
71+
{
72+
"role": "system",
73+
"content": "You are a friendly assistant"
74+
},
75+
{
76+
"role": "user",
77+
"content": "What is Cloudflare?"
78+
}
79+
]
80+
},
81+
"config": {
82+
"requestTimeout": 3000
83+
},
84+
}
85+
]'
86+
```
87+
88+
#### Direct provider
89+
90+
If set on a [provider](/ai-gateway/providers/) request, request timeout specifies the timeout duration for a request and - if exceeded - returns an error.
91+
92+
For a provider-specific endpoint, configure the timeout value by adding a `cf-aig-request-timeout` header.
93+
94+
```bash title="Provider-specific endpoint example" {4}
95+
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \
96+
--header 'Authorization: Bearer {cf_api_token}' \
97+
--header 'Content-Type: application/json' \
98+
--header 'cf-aig-request-timeout: 5000'
99+
--data '{"prompt": "What is Cloudflare?"}'
100+
```
101+
102+
---
103+
104+
## Request retries
105+
106+
AI Gateway also supports automatic retries for failed requests, with a maximum of five retry attempts.
107+
108+
This feature improves your application's resiliency, ensuring you can recover from temporary issues without manual intervention.
109+
110+
Request timeouts can be set on a Universal Endpoint or directly on a request to any provider.
111+
112+
### Definitions
113+
114+
With request retries, you can adjust a combination of three properties:
115+
116+
- Number of attempts (maximum of 5 tries)
117+
- How long before retrying (in milliseconds, maximum of 5 seconds)
118+
- Backoff method (constant, linear, or exponential)
119+
120+
On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes.
121+
122+
### Configuration
123+
124+
#### Universal endpoint
125+
126+
If set on a [Universal Endpoint](/ai-gateway/providers/universal/), a request retry will automatically retry failed requests up to five times before triggering any configured fallbacks.
127+
128+
For a Universal Endpoint, configure the retry settings with the following properties in the provider-specific `config`:
129+
130+
```json
131+
config:{
132+
maxAttempts?: number;
133+
retryDelay?: number;
134+
backoff?: "constant" | "linear" | "exponential";
135+
}
136+
```
137+
138+
As with the [request timeout](/ai-gateway/configuration/request-handling/#universal-endpoint), each provider can have a different retry settings for granular customization.
139+
140+
```bash title="Provider-level config" {11-15} collapse={16-55}
141+
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
142+
--header 'Content-Type: application/json' \
143+
--data '[
144+
{
145+
"provider": "workers-ai",
146+
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
147+
"headers": {
148+
"Authorization": "Bearer {cloudflare_token}",
149+
"Content-Type": "application/json"
150+
},
151+
"config": {
152+
"maxAttempts": 2,
153+
"retryDelay": 1000,
154+
"backoff": "constant"
155+
},
156+
"query": {
157+
"messages": [
158+
{
159+
"role": "system",
160+
"content": "You are a friendly assistant"
161+
},
162+
{
163+
"role": "user",
164+
"content": "What is Cloudflare?"
165+
}
166+
]
167+
}
168+
},
169+
{
170+
"provider": "workers-ai",
171+
"endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
172+
"headers": {
173+
"Authorization": "Bearer {cloudflare_token}",
174+
"Content-Type": "application/json"
175+
},
176+
"query": {
177+
"messages": [
178+
{
179+
"role": "system",
180+
"content": "You are a friendly assistant"
181+
},
182+
{
183+
"role": "user",
184+
"content": "What is Cloudflare?"
185+
}
186+
]
187+
},
188+
"config": {
189+
"maxAttempts": 4,
190+
"retryDelay": 1000,
191+
"backoff": "exponential"
192+
},
193+
}
194+
]'
195+
```
196+
197+
#### Direct provider
198+
199+
If set on a [provider](/ai-gateway/providers/) request, a request retry will automatically retry failed requests up to five times. On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes.
200+
201+
For a provider-specific endpoint, configure the retry settings by adding different header values:
202+
203+
- `cf-aig-max-attempts` (number)
204+
- `cf-aig-retry-delay` (number)
205+
- `cf-aig-backoff` ("constant" | "linear" | "exponential)

src/content/glossary/ai-gateway.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,22 @@ entries:
4141
general_definition: |-
4242
Header to [bypass caching for a specific request](/ai-gateway/configuration/caching/#skip-cache-cf-aig-skip-cache).
4343
44+
- term: cf-aig-request-timeout
45+
general_definition: |-
46+
Header to trigger a fallback provider based on a [predetermined response time](/ai-gateway/configuration/fallbacks/#request-timeouts) (measured in milliseconds).
47+
48+
- term: cf-aig-max-attempts
49+
general_definition: |-
50+
Header to customize the number of max attempts for [request retries](/ai-gateway/configuration/request-handling/#request-retries) of a request.
51+
52+
- term: cf-aig-retry-delay
53+
general_definition: |-
54+
Header to customize the retry delay for [request retries](/ai-gateway/configuration/request-handling/#request-retries) of a request.
55+
56+
- term: cf-aig-backoff
57+
general_definition: |-
58+
Header to customize the backoff type for [request retries](/ai-gateway/configuration/request-handling/#request-retries) of a request.
59+
4460
# Deprecated headers
4561
- term: cf-cache-ttl
4662
general_definition: |-

0 commit comments

Comments
 (0)