Skip to content

Commit daeab12

Browse files
committed
partial
1 parent eef3a0c commit daeab12

File tree

4 files changed

+123
-4
lines changed

4 files changed

+123
-4
lines changed

src/content/changelogs/ai-gateway.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,16 @@ productLink: "/ai-gateway/"
55
productArea: Developer platform
66
productAreaLink: /workers/platform/changelog/platform/
77
entries:
8+
- publish_date: "2025-01-23"
9+
title: Added request handling
10+
description: |-
11+
* Added [request handling options](/ai-gateway/request-handling/) to help manage AI provider interactions effectively, ensuring your applications remain responsive and reliable.
12+
813
- publish_date: "2025-01-02"
914
title: DeepSeek
1015
description: |-
1116
* **Configuration**: Added [DeepSeek](/ai-gateway/providers/deepseek/) as a new provider.
12-
17+
1318
- publish_date: "2024-12-17"
1419
title: AI Gateway Dashboard
1520
description: |-

src/content/docs/ai-gateway/configuration/fallbacks.mdx

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,17 @@ import { Render } from "~/components";
99

1010
Specify model or provider fallbacks with your [Universal endpoint](/ai-gateway/providers/universal/) to handle request failures and ensure reliability.
1111

12-
Fallbacks are currently triggered only when a request encounters an error. We are working to expand fallback functionality to include time-based triggers, which will allow requests that exceed a predefined response time to timeout and fallback.
12+
Cloudflare can trigger your fallback provider in response to [request errors](#request-failures) or [predetermined request timeouts](#request-timeouts). The [response header `cf-aig-step`](#response-headercf-aig-step) indicates which step successfully processed the request.
1313

14-
## Example
14+
## Request failures
1515

16-
In the following example, a request first goes to the [Workers AI](/workers-ai/) Inference API. If the request fails, it falls back to OpenAI. The response header `cf-aig-step` indicates which provider successfully processed the request.
16+
By default, Cloudflare triggers your fallback if a model request returns an error.
17+
18+
### Example
19+
20+
In the following example, a request first goes to the [Workers AI](/workers-ai/) Inference API. If the request fails, it falls back to OpenAI.
21+
22+
In the following example, a request first goes to the [Workers AI](/workers-ai/) Inference API. If the request fails, it falls back to OpenAI. The response header `cf-aig-step` indicates which provider successfully processed the request.
1723

1824
1. Sends a request to Workers AI Inference API.
1925
2. If that request fails, proceeds to OpenAI.
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
pcx_content_type: configuration
3+
title: Request handling
4+
sidebar:
5+
order: 4
6+
---
7+
8+
import { Render, Aside } from "~/components";
9+
10+
Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.
11+
12+
## Request timeouts
13+
14+
A request timeout allows you to trigger fallbacks or a retry if a provider takes too long to respond.
15+
16+
These timeouts help:
17+
18+
- Improve user experience, by preventing users from waiting too long for a response
19+
- Proactively handle errors, by detecting unresponsive providers and triggering a fallback option
20+
21+
Request timeouts can be set on a [Universal Endpoint](/ai-gateway/providers/universal/) or directly on a request to any [provider](/ai-gateway/providers/):
22+
23+
- If set on a Universal Endpoint, it specifies the timeout duration for requests and triggers a fallback.
24+
- If set on a provider request, it specifies the timeout duration for a request and - if exceeded - returns an error.
25+
26+
### Definitions
27+
28+
A timeout is set in milliseconds. Additionaly, the timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe - such as when streaming a response - your gateway will wait for the response.
29+
30+
### Configuration
31+
32+
#### Universal Endpoint
33+
34+
For a Universal endpoint, configure the timeout value by setting a `requestTimeout` property at the
35+
36+
by using one or more of the following properties, which are listed in order of priority:
37+
38+
| Priority | Property |
39+
| -------- | ---------------------------------------------------------------------------------------------------------------------- |
40+
| 1 | `requestTimeout` (added as a universal attribute) |
41+
| 2 | `cf-aig-request-timeout` (header included at the [provider level](/ai-gateway/providers/universal/#payload-reference)) |
42+
| 3 | `cf-aig-request-timeout` (header included at the request level) |
43+
44+
Your gateway follows this hierarchy to determine the timeout duration before implementing a fallback.
45+
46+
### Request timeout example
47+
48+
These request timeout values can interact to customize the behavior of your universal gateway.
49+
50+
In this example, the request will try to answer `What is Cloudflare?` within 1000 milliseconds using the normal `@cf/meta/llama-3.1-8b-instruct` model. The `requestTimeout` property takes precedence over the `cf-aig-request-timeout` for `@cf/meta/llama-3.1-8b-instruct`.
51+
52+
If that fails, then the gateway will timeout and move to the fallback `@cf/meta/llama-3.1-8b-instruct-fast` model. This model has 3000 milliseconds - determined by the request-level `cf-aig-request-timeout` value - to complete the request and provide an answer.
53+
54+
```bash title="Request" collapse={36-50} {2,11,13-15}
55+
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
56+
--header 'cf-aig-request-timeout: 3000' \
57+
--header 'Content-Type: application/json' \
58+
--data '[
59+
{
60+
"provider": "workers-ai",
61+
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
62+
"headers": {
63+
"Authorization": "Bearer {cloudflare_token}",
64+
"Content-Type": "application/json",
65+
"cf-aig-request-timeout": "2000"
66+
},
67+
"config": {
68+
"requestTimeout": 1000
69+
},
70+
"query": {
71+
"messages": [
72+
{
73+
"role": "system",
74+
"content": "You are a friendly assistant"
75+
},
76+
{
77+
"role": "user",
78+
"content": "What is Cloduflare?"
79+
}
80+
]
81+
}
82+
},
83+
{
84+
"provider": "workers-ai",
85+
"endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
86+
"headers": {
87+
"Authorization": "Bearer {cloudflare_token}",
88+
"Content-Type": "application/json"
89+
},
90+
"query": {
91+
"messages": [
92+
{
93+
"role": "system",
94+
"content": "You are a friendly assistant"
95+
},
96+
{
97+
"role": "user",
98+
"content": "What is Cloudflare?"
99+
}
100+
]
101+
}
102+
}
103+
]'
104+
```

src/content/glossary/ai-gateway.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@ entries:
4141
general_definition: |-
4242
Header to [bypass caching for a specific request](/ai-gateway/configuration/caching/#skip-cache-cf-aig-skip-cache).
4343
44+
- term: cf-aig-request-timeout
45+
general_definition: |-
46+
Header to trigger a fallback provider based on a [predetermined response time](/ai-gateway/configuration/fallbacks/#request-timeouts) (measured in milliseconds).
47+
4448
# Deprecated headers
4549
- term: cf-cache-ttl
4650
general_definition: |-

0 commit comments

Comments
 (0)