Add embedding model token and batch size limits

daisyfaithauma · daisyfaithauma · commit 7689359d6024 · 2024-12-31T18:56:24.000Z
diff --git a/src/content/docs/workers-ai/platform/limits.mdx b/src/content/docs/workers-ai/platform/limits.mdx
@@ -3,10 +3,9 @@ pcx_content_type: configuration
 title: Limits
 sidebar:
   order: 2
-
 ---
 
-import { Render } from "~/components"
+import { Render } from "~/components";
 
 Workers AI is now Generally Available. We've updated our rate limits to reflect this.
 
@@ -20,48 +19,63 @@ Rate limits are default per task type, with some per-model limits defined as fol
 
 ### [Automatic Speech Recognition](/workers-ai/models/#automatic-speech-recognition)
 
-* 720 requests per minute
+- 720 requests per minute
 
 ### [Image Classification](/workers-ai/models/#image-classification)
 
-* 3000 requests per minute
+- 3000 requests per minute
 
 ### [Image-to-Text](/workers-ai/models/#image-to-text)
 
-* 720 requests per minute
+- 720 requests per minute
 
 ### [Object Detection](/workers-ai/models/#object-detection)
 
-* 3000 requests per minute
+- 3000 requests per minute
 
 ### [Summarization](/workers-ai/models/#summarization)
 
-* 1500 requests per minute
+- 1500 requests per minute
 
 ### [Text Classification](/workers-ai/models/#text-classification)
 
-* 2000 requests per minute
+- 2000 requests per minute
 
 ### [Text Embeddings](/workers-ai/models/#text-embeddings)
 
-* 3000 requests per minute
-* [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) is 1500 requests per minute
+- 3000 requests per minute
+- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) is 1500 requests per minute
+
+#### Additional limits for Embedding Models
+
+When using `@cf/baai/bge` embedding models, the following limits apply:
+
+- The maximum token limit per input is 512 tokens.
+- The maximum batch size is100 inputs per request.
+  - The total number of tokens across all inputs in the batch must not exceed internal processing limits.
+  - Larger inputs (closer to 512 tokens) may reduce the maximum batch size due to these constraints.
+
+#### Behavior and constraints
+
+1. Exceeding the batch size limit:If more than 100 inputs are provided, a `400 Bad Request` error is returned.
+2. Exceeding the token limit per input: If a single input exceeds 512 tokens, the request will fail with a `400 Bad Request` error.
+3. Combined constraints:Requests with both a high batch size and large token inputs may fail due to exceeding the model's processing limits.
 
 ### [Text Generation](/workers-ai/models/#text-generation)
 
-* 300 requests per minute
-* [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) is 400 requests per minute
-* [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) is 720 requests per minute
-* [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute
-* [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute
-* [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute
-* [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute
+- 300 requests per minute
+- [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) is 400 requests per minute
+- [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) is 720 requests per minute
+- [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute
+- [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute
+- [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute
+- [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute
 
 ### [Text-to-Image](/workers-ai/models/#text-to-image)
 
-* 720 requests per minute
-* [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) is 1500 requests per minute
+- 720 requests per minute
+- [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) is 1500 requests per minute
 
 ### [Translation](/workers-ai/models/#translation)
 
-* 720 requests per minute
+- 720 requests per minute