-
Notifications
You must be signed in to change notification settings - Fork 10.3k
AI gateway -> request handling #19619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
daeab12
partial
kodster28 df951e2
mostly cleaned up. Still need to add more headeres to glossary
kodster28 c5af8b4
Update fallbacks.mdx
kathayl fed42cf
Update request-handling.mdx
kathayl 4993ffb
Update ai-gateway.yaml
kathayl 5c0eae0
Update request-handling.mdx
kathayl ffe91d1
remove random file
kodster28 f56cf44
remove file
kodster28 01021cf
Add headers
kodster28 9cb76f6
Added example
kodster28 3f3eb25
Apply suggestions from code review
kodster28 acdf893
Update request-handling.mdx
kathayl 3a8b538
fix conflict
kodster28 7960f42
fix highlight
kodster28 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
205 changes: 205 additions & 0 deletions
205
src/content/docs/ai-gateway/configuration/request-handling.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,205 @@ | ||
| --- | ||
| pcx_content_type: configuration | ||
| title: Request handling | ||
| sidebar: | ||
| order: 4 | ||
| --- | ||
|
|
||
| import { Render, Aside } from "~/components"; | ||
|
|
||
| Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable. | ||
|
|
||
| ## Request timeouts | ||
|
|
||
| A request timeout allows you to trigger fallbacks or a retry if a provider takes too long to respond. | ||
|
|
||
| These timeouts help: | ||
|
|
||
| - Improve user experience, by preventing users from waiting too long for a response | ||
| - Proactively handle errors, by detecting unresponsive providers and triggering a fallback option | ||
|
|
||
| Request timeouts can be set on a Universal Endpoint or directly on a request to any provider. | ||
|
|
||
| ### Definitions | ||
|
|
||
| A timeout is set in milliseconds. Additionally, the timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe - such as when streaming a response - your gateway will wait for the response. | ||
|
|
||
| ### Configuration | ||
|
|
||
| #### Universal Endpoint | ||
|
|
||
| If set on a [Universal Endpoint](/ai-gateway/providers/universal/), a request timeout specifies the timeout duration for requests and triggers a fallback. | ||
|
|
||
| For a Universal Endpoint, configure the timeout value by setting a `requestTimeout` property within the provider-specific `config` object. Each provider can have a different `requestTimeout` value for granular customization. | ||
|
|
||
| ```bash title="Provider-level config" {11-13} collapse={15-48} | ||
| curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \ | ||
| --header 'Content-Type: application/json' \ | ||
| --data '[ | ||
| { | ||
| "provider": "workers-ai", | ||
| "endpoint": "@cf/meta/llama-3.1-8b-instruct", | ||
| "headers": { | ||
| "Authorization": "Bearer {cloudflare_token}", | ||
| "Content-Type": "application/json" | ||
| }, | ||
| "config": { | ||
| "requestTimeout": 1000 | ||
| }, | ||
| "query": { | ||
| "messages": [ | ||
| { | ||
| "role": "system", | ||
| "content": "You are a friendly assistant" | ||
| }, | ||
| { | ||
| "role": "user", | ||
| "content": "What is Cloudflare?" | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| { | ||
| "provider": "workers-ai", | ||
| "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast", | ||
| "headers": { | ||
| "Authorization": "Bearer {cloudflare_token}", | ||
| "Content-Type": "application/json" | ||
| }, | ||
| "query": { | ||
| "messages": [ | ||
| { | ||
| "role": "system", | ||
| "content": "You are a friendly assistant" | ||
| }, | ||
| { | ||
| "role": "user", | ||
| "content": "What is Cloudflare?" | ||
| } | ||
| ] | ||
| }, | ||
| "config": { | ||
| "requestTimeout": 3000 | ||
| }, | ||
| } | ||
| ]' | ||
| ``` | ||
|
|
||
| #### Direct provider | ||
|
|
||
| If set on a [provider](/ai-gateway/providers/) request, request timeout specifies the timeout duration for a request and - if exceeded - returns an error. | ||
|
|
||
| For a provider-specific endpoint, configure the timeout value by adding a `cf-aig-request-timeout` header. | ||
|
|
||
| ```bash title="Provider-specific endpoint example" {4} | ||
| curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \ | ||
| --header 'Authorization: Bearer {cf_api_token}' \ | ||
| --header 'Content-Type: application/json' \ | ||
| --header 'cf-aig-request-timeout: 5000' | ||
| --data '{"prompt": "What is Cloudflare?"}' | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Request retries | ||
|
|
||
| AI Gateway also supports automatic retries for failed requests, with a maximum of five retry attempts. | ||
|
|
||
| This feature improves your application's resiliency, ensuring you can recover from temporary issues without manual intervention. | ||
|
|
||
| Request timeouts can be set on a Universal Endpoint or directly on a request to any provider. | ||
|
|
||
| ### Definitions | ||
|
|
||
| With request retries, you can adjust a combination of three properties: | ||
|
|
||
| - Number of attempts (maximum of 5 tries) | ||
| - How long before retrying (in milliseconds, maximum of 5 seconds) | ||
| - Backoff method (constant, linear, or exponential) | ||
|
|
||
| On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes. | ||
|
|
||
| ### Configuration | ||
|
|
||
| #### Universal endpoint | ||
|
|
||
| If set on a [Universal Endpoint](/ai-gateway/providers/universal/), a request retry will automatically retry failed requests up to five times before triggering any configured fallbacks. | ||
|
|
||
| For a Universal Endpoint, configure the retry settings with the following properties in the provider-specific `config`: | ||
|
|
||
| ```json | ||
| config:{ | ||
| maxAttempts?: number; | ||
| retryDelay?: number; | ||
| backoff?: "constant" | "linear" | "exponential"; | ||
| } | ||
| ``` | ||
|
|
||
| As with the [request timeout](/ai-gateway/configuration/request-handling/#universal-endpoint), each provider can have a different retry settings for granular customization. | ||
|
|
||
| ```bash title="Provider-level config" {11-15} collapse={16-55} | ||
| curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \ | ||
| --header 'Content-Type: application/json' \ | ||
| --data '[ | ||
| { | ||
| "provider": "workers-ai", | ||
| "endpoint": "@cf/meta/llama-3.1-8b-instruct", | ||
| "headers": { | ||
| "Authorization": "Bearer {cloudflare_token}", | ||
| "Content-Type": "application/json" | ||
| }, | ||
| "config": { | ||
| "maxAttempts": 2, | ||
| "retryDelay": 1000, | ||
| "backoff": "constant" | ||
| }, | ||
| "query": { | ||
| "messages": [ | ||
| { | ||
| "role": "system", | ||
| "content": "You are a friendly assistant" | ||
| }, | ||
| { | ||
| "role": "user", | ||
| "content": "What is Cloudflare?" | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| { | ||
| "provider": "workers-ai", | ||
| "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast", | ||
| "headers": { | ||
| "Authorization": "Bearer {cloudflare_token}", | ||
| "Content-Type": "application/json" | ||
| }, | ||
| "query": { | ||
| "messages": [ | ||
| { | ||
| "role": "system", | ||
| "content": "You are a friendly assistant" | ||
| }, | ||
| { | ||
| "role": "user", | ||
| "content": "What is Cloudflare?" | ||
| } | ||
| ] | ||
| }, | ||
| "config": { | ||
| "maxAttempts": 4, | ||
| "retryDelay": 1000, | ||
| "backoff": "exponential" | ||
| }, | ||
| } | ||
| ]' | ||
| ``` | ||
|
|
||
| #### Direct provider | ||
kodster28 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| If set on a [provider](/ai-gateway/providers/) request, a request retry will automatically retry failed requests up to five times. On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes. | ||
|
|
||
| For a provider-specific endpoint, configure the retry settings by adding different header values: | ||
|
|
||
| - `cf-aig-max-attempts` (number) | ||
| - `cf-aig-retry-delay` (number) | ||
| - `cf-aig-backoff` ("constant" | "linear" | "exponential) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.