@@ -3,10 +3,9 @@ pcx_content_type: configuration
33title : Limits
44sidebar :
55 order : 2
6-
76---
87
9- import { Render } from " ~/components"
8+ import { Render } from " ~/components" ;
109
1110Workers AI is now Generally Available. We've updated our rate limits to reflect this.
1211
@@ -20,48 +19,63 @@ Rate limits are default per task type, with some per-model limits defined as fol
2019
2120### [ Automatic Speech Recognition] ( /workers-ai/models/#automatic-speech-recognition )
2221
23- * 720 requests per minute
22+ - 720 requests per minute
2423
2524### [ Image Classification] ( /workers-ai/models/#image-classification )
2625
27- * 3000 requests per minute
26+ - 3000 requests per minute
2827
2928### [ Image-to-Text] ( /workers-ai/models/#image-to-text )
3029
31- * 720 requests per minute
30+ - 720 requests per minute
3231
3332### [ Object Detection] ( /workers-ai/models/#object-detection )
3433
35- * 3000 requests per minute
34+ - 3000 requests per minute
3635
3736### [ Summarization] ( /workers-ai/models/#summarization )
3837
39- * 1500 requests per minute
38+ - 1500 requests per minute
4039
4140### [ Text Classification] ( /workers-ai/models/#text-classification )
4241
43- * 2000 requests per minute
42+ - 2000 requests per minute
4443
4544### [ Text Embeddings] ( /workers-ai/models/#text-embeddings )
4645
47- * 3000 requests per minute
48- * [ @cf/baai /bge-large-en-v1.5] ( /workers-ai/models/bge-large-en-v1.5/ ) is 1500 requests per minute
46+ - 3000 requests per minute
47+ - [ @cf/baai /bge-large-en-v1.5] ( /workers-ai/models/bge-large-en-v1.5/ ) is 1500 requests per minute
48+
49+ #### Additional limits for Embedding Models
50+
51+ When using ` @cf/baai/bge ` embedding models, the following limits apply:
52+
53+ - The maximum token limit per input is 512 tokens.
54+ - The maximum batch size is100 inputs per request.
55+ - The total number of tokens across all inputs in the batch must not exceed internal processing limits.
56+ - Larger inputs (closer to 512 tokens) may reduce the maximum batch size due to these constraints.
57+
58+ #### Behavior and constraints
59+
60+ 1 . Exceeding the batch size limit:If more than 100 inputs are provided, a ` 400 Bad Request ` error is returned.
61+ 2 . Exceeding the token limit per input: If a single input exceeds 512 tokens, the request will fail with a ` 400 Bad Request ` error.
62+ 3 . Combined constraints:Requests with both a high batch size and large token inputs may fail due to exceeding the model's processing limits.
4963
5064### [ Text Generation] ( /workers-ai/models/#text-generation )
5165
52- * 300 requests per minute
53- * [ @hf/thebloke /mistral-7b-instruct-v0.1-awq] ( /workers-ai/models/mistral-7b-instruct-v0.1-awq/ ) is 400 requests per minute
54- * [ @cf/microsoft /phi-2] ( /workers-ai/models/phi-2/ ) is 720 requests per minute
55- * [ @cf/qwen /qwen1.5-0.5b-chat] ( /workers-ai/models/qwen1.5-0.5b-chat/ ) is 1500 requests per minute
56- * [ @cf/qwen /qwen1.5-1.8b-chat] ( /workers-ai/models/qwen1.5-1.8b-chat/ ) is 720 requests per minute
57- * [ @cf/qwen /qwen1.5-14b-chat-awq] ( /workers-ai/models/qwen1.5-14b-chat-awq/ ) is 150 requests per minute
58- * [ @cf/tinyllama /tinyllama-1.1b-chat-v1.0] ( /workers-ai/models/tinyllama-1.1b-chat-v1.0/ ) is 720 requests per minute
66+ - 300 requests per minute
67+ - [ @hf/thebloke /mistral-7b-instruct-v0.1-awq] ( /workers-ai/models/mistral-7b-instruct-v0.1-awq/ ) is 400 requests per minute
68+ - [ @cf/microsoft /phi-2] ( /workers-ai/models/phi-2/ ) is 720 requests per minute
69+ - [ @cf/qwen /qwen1.5-0.5b-chat] ( /workers-ai/models/qwen1.5-0.5b-chat/ ) is 1500 requests per minute
70+ - [ @cf/qwen /qwen1.5-1.8b-chat] ( /workers-ai/models/qwen1.5-1.8b-chat/ ) is 720 requests per minute
71+ - [ @cf/qwen /qwen1.5-14b-chat-awq] ( /workers-ai/models/qwen1.5-14b-chat-awq/ ) is 150 requests per minute
72+ - [ @cf/tinyllama /tinyllama-1.1b-chat-v1.0] ( /workers-ai/models/tinyllama-1.1b-chat-v1.0/ ) is 720 requests per minute
5973
6074### [ Text-to-Image] ( /workers-ai/models/#text-to-image )
6175
62- * 720 requests per minute
63- * [ @cf/runwayml /stable-diffusion-v1-5-img2img] ( /workers-ai/models/stable-diffusion-v1-5-img2img/ ) is 1500 requests per minute
76+ - 720 requests per minute
77+ - [ @cf/runwayml /stable-diffusion-v1-5-img2img] ( /workers-ai/models/stable-diffusion-v1-5-img2img/ ) is 1500 requests per minute
6478
6579### [ Translation] ( /workers-ai/models/#translation )
6680
67- * 720 requests per minute
81+ - 720 requests per minute
0 commit comments