Skip to content

Embeddings batch size vs max tokens not clear. #11316

@mick-net

Description

@mick-net

Which Cloudflare product does this pertain to?

Workers AI

Existing documentation URL(s)

What changes are you suggesting?

The embedding models work, however, I encounter some issues regarding input limits.
The @cf/baai/bge embedding models have a max token input limit of 512.
In the embedding example, Cloudflare is using a batch of 3 inputs:

import requests
API_BASE_URL = "https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}}/ai/run/"
headers = {"Authorization": "Bearer {API_TOKEN}"}

def run(model, input):
    response = requests.post(f"{API_BASE_URL}{model}", headers=headers, json=input)
    return response.json()

stories = [
  'This is a story about an orange cloud',
  'This is a story about a llama',
  'This is a story about a hugging emoji'
]
    
output = run("@cf/baai/bge-base-en-v1.5", { "text": stories })
print(output)

When I use the example sentences I'm able to batch up to 100 sentences in one call. I'm getting an error when going over 100 as batch size. This is not clear from the docs. However, if I increase the token size per sentence I'm also getting errors below a batch size of 100. So could you clarify how the max 512 tokens are related to the batch size as well?

Most other embedding providers have also a token limit of 512 (as this is model-specific), however they also are more clear on the batch size.

Additional information

No response

Metadata

Metadata

Labels

content:editRequest for content editsdocumentationDocumentation editsproduct:workers-aiWorkers AI: https://developers.cloudflare.com/workers-ai/

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions