Skip to content

Commit 7689359

Browse files
Add embedding model token and batch size limits
1 parent f1f460f commit 7689359

File tree

1 file changed

+34
-20
lines changed

1 file changed

+34
-20
lines changed

src/content/docs/workers-ai/platform/limits.mdx

Lines changed: 34 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,9 @@ pcx_content_type: configuration
33
title: Limits
44
sidebar:
55
order: 2
6-
76
---
87

9-
import { Render } from "~/components"
8+
import { Render } from "~/components";
109

1110
Workers AI is now Generally Available. We've updated our rate limits to reflect this.
1211

@@ -20,48 +19,63 @@ Rate limits are default per task type, with some per-model limits defined as fol
2019

2120
### [Automatic Speech Recognition](/workers-ai/models/#automatic-speech-recognition)
2221

23-
* 720 requests per minute
22+
- 720 requests per minute
2423

2524
### [Image Classification](/workers-ai/models/#image-classification)
2625

27-
* 3000 requests per minute
26+
- 3000 requests per minute
2827

2928
### [Image-to-Text](/workers-ai/models/#image-to-text)
3029

31-
* 720 requests per minute
30+
- 720 requests per minute
3231

3332
### [Object Detection](/workers-ai/models/#object-detection)
3433

35-
* 3000 requests per minute
34+
- 3000 requests per minute
3635

3736
### [Summarization](/workers-ai/models/#summarization)
3837

39-
* 1500 requests per minute
38+
- 1500 requests per minute
4039

4140
### [Text Classification](/workers-ai/models/#text-classification)
4241

43-
* 2000 requests per minute
42+
- 2000 requests per minute
4443

4544
### [Text Embeddings](/workers-ai/models/#text-embeddings)
4645

47-
* 3000 requests per minute
48-
* [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) is 1500 requests per minute
46+
- 3000 requests per minute
47+
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) is 1500 requests per minute
48+
49+
#### Additional limits for Embedding Models
50+
51+
When using `@cf/baai/bge` embedding models, the following limits apply:
52+
53+
- The maximum token limit per input is 512 tokens.
54+
- The maximum batch size is100 inputs per request.
55+
- The total number of tokens across all inputs in the batch must not exceed internal processing limits.
56+
- Larger inputs (closer to 512 tokens) may reduce the maximum batch size due to these constraints.
57+
58+
#### Behavior and constraints
59+
60+
1. Exceeding the batch size limit:If more than 100 inputs are provided, a `400 Bad Request` error is returned.
61+
2. Exceeding the token limit per input: If a single input exceeds 512 tokens, the request will fail with a `400 Bad Request` error.
62+
3. Combined constraints:Requests with both a high batch size and large token inputs may fail due to exceeding the model's processing limits.
4963

5064
### [Text Generation](/workers-ai/models/#text-generation)
5165

52-
* 300 requests per minute
53-
* [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) is 400 requests per minute
54-
* [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) is 720 requests per minute
55-
* [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute
56-
* [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute
57-
* [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute
58-
* [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute
66+
- 300 requests per minute
67+
- [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) is 400 requests per minute
68+
- [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) is 720 requests per minute
69+
- [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute
70+
- [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute
71+
- [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute
72+
- [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute
5973

6074
### [Text-to-Image](/workers-ai/models/#text-to-image)
6175

62-
* 720 requests per minute
63-
* [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) is 1500 requests per minute
76+
- 720 requests per minute
77+
- [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) is 1500 requests per minute
6478

6579
### [Translation](/workers-ai/models/#translation)
6680

67-
* 720 requests per minute
81+
- 720 requests per minute

0 commit comments

Comments
 (0)