Skip to content
7 changes: 6 additions & 1 deletion src/content/changelogs/ai-gateway.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,16 @@ productLink: "/ai-gateway/"
productArea: Developer platform
productAreaLink: /workers/platform/changelog/platform/
entries:
- publish_date: "2025-01-23"
title: Added request handling
description: |-
* Added [request handling options](/ai-gateway/request-handling/) to help manage AI provider interactions effectively, ensuring your applications remain responsive and reliable.

- publish_date: "2025-01-02"
title: DeepSeek
description: |-
* **Configuration**: Added [DeepSeek](/ai-gateway/providers/deepseek/) as a new provider.

- publish_date: "2024-12-17"
title: AI Gateway Dashboard
description: |-
Expand Down
12 changes: 9 additions & 3 deletions src/content/docs/ai-gateway/configuration/fallbacks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,17 @@ import { Render } from "~/components";

Specify model or provider fallbacks with your [Universal endpoint](/ai-gateway/providers/universal/) to handle request failures and ensure reliability.

Fallbacks are currently triggered only when a request encounters an error. We are working to expand fallback functionality to include time-based triggers, which will allow requests that exceed a predefined response time to timeout and fallback.
Cloudflare can trigger your fallback provider in response to [request errors](#request-failures) or [predetermined request timeouts](#request-timeouts). The [response header `cf-aig-step`](#response-headercf-aig-step) indicates which step successfully processed the request.

## Example
## Request failures

In the following example, a request first goes to the [Workers AI](/workers-ai/) Inference API. If the request fails, it falls back to OpenAI. The response header `cf-aig-step` indicates which provider successfully processed the request.
By default, Cloudflare triggers your fallback if a model request returns an error.

### Example

In the following example, a request first goes to the [Workers AI](/workers-ai/) Inference API. If the request fails, it falls back to OpenAI.

In the following example, a request first goes to the [Workers AI](/workers-ai/) Inference API. If the request fails, it falls back to OpenAI. The response header `cf-aig-step` indicates which provider successfully processed the request.

1. Sends a request to Workers AI Inference API.
2. If that request fails, proceeds to OpenAI.
Expand Down
153 changes: 153 additions & 0 deletions src/content/docs/ai-gateway/configuration/request-handling.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
---
pcx_content_type: configuration
title: Request handling
sidebar:
order: 4
---

import { Render, Aside } from "~/components";

Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.

## Request timeouts

A request timeout allows you to trigger fallbacks or a retry if a provider takes too long to respond.

These timeouts help:

- Improve user experience, by preventing users from waiting too long for a response
- Proactively handle errors, by detecting unresponsive providers and triggering a fallback option

Request timeouts can be set on a Universal Endpoint or directly on a request to any provider.

### Definitions

A timeout is set in milliseconds. Additionaly, the timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe - such as when streaming a response - your gateway will wait for the response.

### Configuration

#### Universal Endpoint

If set on a [Universal Endpoint](/ai-gateway/providers/universal/), a request timeout specifies the timeout duration for requests and triggers a fallback.

For a Universal Endpoint, configure the timeout value by setting a `requestTimeout` property either as a universal attribute or within the provider-specific `config` object.

```bash title="Provider-level config" {12-14} collapse={15-48}
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
--header 'Content-Type: application/json' \
--data '[
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json",
"cf-aig-request-timeout": "2000"
},
"config": {
"requestTimeout": 1000
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloduflare?"
}
]
}
},
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
},
"config": {
"requestTimeout": 3000
},
}
]'
```

To further customize request handling, you can include unique `RequestTimeout` values for each provider and a default across your Universal Endpoint.

In this case, the most specific value takes precedence. A timeout value on a specific provider would take precedence over the one on the endpoint itself.

#### Direct provider

If set on a [provider](/ai-gateway/providers/) request, request timeout specifies the timeout duration for a request and - if exceeded - returns an error.

For a provider-specific endpoint, configure the timeout value by adding a `cf-aig-request-timeout` header.

```bash title="Provider-specific endpoint example" {4}
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \
--header 'Authorization: Bearer {cf_api_token}' \
--header 'Content-Type: application/json' \
--header 'cf-aig-request-timeout: 5000'
--data '{"prompt": "What is Cloudflare?"}'
```

---

## Request retries

AI Gateway also supports automatic retries for failed requests, with a maximum of five retry attempts.

This feature improves your application's resiliency, ensuring you can recover from temporary issues without manual intervention.

Request timeouts can be set on a Universal Endpoint or directly on a request to any provider.

### Definitions

With request retries, you can adjust a combination of three properties:

- Number of attempts (max of 5 tries)
- How long before retrying (in milliseconds, max of 5 seconds)
- Backoff method (constant, linear, or exponential)

On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes.

### Configuration

#### Universal endpoint

If set on a [Universal Endpoint](/ai-gateway/providers/universal/), a request retry will automatically retry failed requests up to five times before triggering any configured fallbacks.

For a Universal Endpoint, configure the timeout value by setting the following properties in the overall or provider-specific `config`:

```json
config:{
maxAttempts?: number;
retryDelay?: number;
backoff?: "constant" | "linear" | "exponential";
}
```

As with the [request timeout](/ai-gateway/configuration/request-handling/#universal-endpoint), the values can interact with each other to provide more customized logic.

#### Direct provider

If set on a [provider](/ai-gateway/providers/) request, request timeout specifies the timeout duration for a request and - if exceeded - returns an error.

For a provider-specific endpoint, configure the timeout value by adding different header values:

- `cf-aig-max-attempts` (number)
- `cf-aig-retry-delay` (number)
- `cf-aig-backoff` ("constant" | "linear" | "exponential)
4 changes: 4 additions & 0 deletions src/content/glossary/ai-gateway.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ entries:
general_definition: |-
Header to [bypass caching for a specific request](/ai-gateway/configuration/caching/#skip-cache-cf-aig-skip-cache).

- term: cf-aig-request-timeout
general_definition: |-
Header to trigger a fallback provider based on a [predetermined response time](/ai-gateway/configuration/fallbacks/#request-timeouts) (measured in milliseconds).

# Deprecated headers
- term: cf-cache-ttl
general_definition: |-
Expand Down
1 change: 1 addition & 0 deletions src/env.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/// <reference path="../.astro/types.d.ts" />
Loading