-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[AI gateway] Request timeouts for fallback providers #19391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
1ac75cb
2e0d6ba
771fedb
25db0f3
1468261
51504e6
49a4f3a
c55f037
01bce58
4bace5c
11d0c45
a4284b9
5b4aba9
8603830
d364b58
b80bf77
b2f5b9c
52d7ba0
0c0d6b4
9724208
531e354
75a3fa3
b2a105b
e20b10b
a2f470f
ec9734a
cc8e28f
c109d91
6fb47a8
f2a95bc
b42795b
08657e6
32e1753
1194418
8fc9dc1
4b134eb
d443c36
04efc52
8461edc
81ec9c8
9db2b4a
28658d7
bb3ffb5
ee8f45b
7bb183e
7f1d8ae
7d0861c
88f28fc
33719bf
ccd9404
5f03ca0
3c1e9d7
27f3a17
25d689a
0978a29
7c74024
fa75318
ea14770
e0f1eaf
4cf1683
395e586
0fdb2d2
e58b451
ff24f05
a64982b
e8fb685
ca62ac1
61c687a
2f23b14
e25160b
45a7c20
7e2cdd4
675c102
c446e3e
5fc1859
669e85b
203946f
c6dc970
6412637
83cd10b
f881057
f19d8e1
111d17f
aa8c42f
1ccc203
a401151
d575551
ab828de
a4e0688
1febb2f
3a4eaab
8f0ae7e
80e9c4e
676e572
98af3bd
04c1b15
0e63a6e
dd7a812
8145f8a
c40f871
eced0eb
e86d7f6
386420a
68edbd7
d8450c2
2eb56d1
2c7dd96
f5fb735
16bc05d
ff78b09
b647f57
2f52c30
dd16d60
0541dc6
f156cdb
36e3262
a934ece
d3d8607
6627c71
98f47c5
fb9a1cd
bb5c503
9d17f8a
a5aecd8
0b675d2
01dd94b
f0800a4
6d7c300
388ae5f
b223160
838a447
4874b37
1ebef4d
014041d
eaad7c7
78b8d6c
861500a
5188f54
cbaab97
f15abe4
e88f5a4
8127ed1
233456e
0fe62f1
c1139a8
685db15
a8eaae8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,18 +16,24 @@ https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} | |
|
|
||
| AI Gateway offers multiple endpoints for each Gateway you create - one endpoint per provider, and one Universal Endpoint. The Universal Endpoint requires some adjusting to your schema, but supports additional features. Some of these features are, for example, retrying a request if it fails the first time, or configuring a [fallback model/provider](/ai-gateway/configuration/fallbacks/). | ||
|
|
||
| ## Payload reference | ||
|
|
||
| You can use the Universal endpoint to contact every provider. The payload is expecting an array of message, and each message is an object with the following parameters: | ||
|
|
||
| - `provider` : the name of the provider you would like to direct this message to. Can be OpenAI, workers-ai, or any of our supported providers. | ||
| - `endpoint`: the pathname of the provider API you’re trying to reach. For example, on OpenAI it can be `chat/completions`, and for Workers AI this might be [`@cf/meta/llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/). See more in the sections that are specific to [each provider](/ai-gateway/providers/). | ||
| - `authorization`: the content of the Authorization HTTP Header that should be used when contacting this provider. This usually starts with “Token” or “Bearer”. | ||
| - `headers`: | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kathayl, I think this is accurate, but fact check me here :)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. switched from caching and authentication to caching and custom metadata. bc authentication is at the level of the request, not provider. |
||
| - `Authorization`: the content of the Authorization HTTP Header that should be used when contacting this provider. This usually starts with “Token” or “Bearer”. | ||
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Any other custom header in a model's [configuration](/ai-gateway/configuration/), such as [Caching](/ai-gateway/configuration/caching/) or [Authentication](/ai-gateway/configuration/authentication/). | ||
| - `query`: the payload as the provider expects it in their official API. | ||
|
|
||
| ## cURL example | ||
|
|
||
| The following example shows a simple setup with a primary model and a [fallback](/ai-gateway/configuration/fallbacks/) option. | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wanted to add more cross-links over to fallbacks page |
||
|
|
||
| <Render file="universal-gateway-example" /> | ||
|
|
||
| The above will send a request to Workers AI Inference API, if it fails it will proceed to OpenAI. You can add as many fallbacks as you need, just by adding another JSON in the array. | ||
| The above will send a request to Workers AI Inference API, if it fails it will proceed to OpenAI. You can add as many [fallbacks](/ai-gateway/configuration/fallbacks/) as you need, just by adding another JSON in the array. | ||
|
|
||
| ## WebSockets API <Badge text="beta" variant="tip" size="small" /> | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,6 +41,10 @@ entries: | |
| general_definition: |- | ||
| Header to [bypass caching for a specific request](/ai-gateway/configuration/caching/#skip-cache-cf-aig-skip-cache). | ||
|
|
||
| - term: cf-aig-request-timeout | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adds automatically to headers glossary page |
||
| general_definition: |- | ||
| Header to trigger a fallback provider based on a [predetermined response time](/ai-gateway/configuration/fallbacks/#request-timeouts) (measured in milliseconds). | ||
|
|
||
| # Deprecated headers | ||
| - term: cf-cache-ttl | ||
| general_definition: |- | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.