You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: New text-to-speech, reranker, whisper, embeddings models now available!
4
+
date: 2025-03-17T17:00:00Z
5
+
---
6
+
7
+
Workers AI is excited to add 4 new models to the catalog, including 2 brand new classes of models with a text-to-speech and reranker model. Introducing:
8
+
-[@cf/baai/bge-m3](/workers-ai/models/bge-m3/) - a multi-lingual embeddings model that supports over 100 languages. It can also simultaneously perform dense retrieval, multi-vector retrieval, and sparse retrieval, with the ability to process inputs of different granularities.
9
+
-[@cf/baai/bge-reranker-base](/workers-ai/models/bge-reranker-base/) - our first reranker model! Rerankers are a type of text classification model that takes a query and context, and outputs a similarity score between the two. When used in RAG systems, you can use a reranker after the initial vector search to find the most relevant documents to return to a user by reranking the outputs.
10
+
-[@cf/openai/whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) - a faster, more accurate speech-to-text model. This model was added earlier but is graduating out of beta with pricing included today.
11
+
-[@cf/myshell-ai/melotts](/workers-ai/models/melotts/) - our first text-to-speech model that allows users to generate an MP3 with voice audio from inputted text.
12
+
13
+
Pricing is available for each of these models on the [Workers AI pricing page](/workers-ai/platform/pricing/).
14
+
15
+
This docs update includes a few minor bug fixes to the model schema for llama-guard, llama-3.2-1b, which you can review on the [product changelog](/workers-ai/changelog/).
16
+
17
+
Try it out and let us know what you think! Stay tuned for more models in the coming days.
Copy file name to clipboardExpand all lines: src/content/docs/workers-ai/platform/pricing.mdx
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,10 @@ All limits reset daily at 00:00 UTC. If you exceed any one of the above limits,
26
26
27
27
Neurons are our way of measuring AI outputs across different models, representing the GPU compute needed to perform your request. Our serverless model allows you to pay only for what you use without having to worry about renting, managing, or scaling GPUs.
28
28
29
+
:::note
30
+
The Price in Tokens column is equivalent to the Price in Neurons column - the different units are displayed so you can easily compare and understand pricing.
31
+
:::
32
+
29
33
## LLM model pricing
30
34
31
35
| Model | Price in Tokens | Price in Neurons |
@@ -46,15 +50,23 @@ Neurons are our way of measuring AI outputs across different models, representin
46
50
|@cf/meta/llama-2-7b-chat-fp16 | $0.556 per M input tokens <br/> $6.667 per M output tokens | 50505 neurons per M input tokens <br/> 606061 neurons per M output tokens |
47
51
|@cf/meta/llama-guard-3-8b | $0.484 per M input tokens <br/> $0.030 per M output tokens | 44003 neurons per M input tokens <br/> 2730 neurons per M output tokens |
|@cf/black-forest-labs/flux-1-schnell | $0.0000528 per 512x512 tile <br/> $0.0001056 per step | 4.80 neurons per 512x512 tile <br/> 9.60 neurons per step |
54
66
|@cf/huggingface/distilbert-sst-2-int8 | $0.026 per M input tokens | 2394 neurons per M input tokens |
55
-
|@cf/baai/bge-small-en-v1.5 | $0.020 per M input tokens | 1841 neurons per M input tokens |
56
-
|@cf/baai/bge-base-en-v1.5 | $0.067 per M input tokens | 6058 neurons per M input tokens |
57
-
|@cf/baai/bge-large-en-v1.5 | $0.204 per M input tokens | 18582 neurons per M input tokens |
67
+
|@cf/baai/bge-reranker-base |$0.003 per M input tokens|283 neurons per M input tokens |
58
68
|@cf/meta/m2m100-1.2b | $0.342 per M input tokens <br/> $0.342 per M output tokens | 31050 neurons per M input tokens <br/> 31050 neurons per M output tokens |
59
69
|@cf/microsoft/resnet-50 | $2.51 per M images | 228055 neurons per M images |
60
70
|@cf/openai/whisper | $0.0005 per audio minute | 41.14 neurons per audio minute |
71
+
|@cf/openai/whisper-large-v3-turbo|$0.0005 per audio minute |46.63 neurons per audio minute |
72
+
|@cf/myshell-ai/melotts |$3.416 per M input tokens|310577 neurons per M input tokens|
"description": "Multi-Functionality, Multi-Linguality, and Multi-Granularity embeddings model.",
6
+
"task": {
7
+
"id": "0137cdcf-162a-4108-94f2-1ca59e8c65ee",
8
+
"name": "Text Embeddings",
9
+
"description": "Feature extraction models transform raw data into numerical features that can be processed while preserving the information in the original dataset. These models are ideal as part of building vector search applications or Retrieval Augmented Generation workflows with Large Language Models (LLM)."
10
+
},
11
+
"tags": [],
12
+
"properties": [],
13
+
"schema": {
14
+
"input": {
15
+
"type": "object",
16
+
"properties": {
17
+
"query": {
18
+
"type": "string",
19
+
"minLength": 1,
20
+
"description": "A query you wish to perform against the provided contexts. If no query is provided the model with respond with embeddings for contexts"
21
+
},
22
+
"contexts": {
23
+
"type": "array",
24
+
"items": {
25
+
"type": "object",
26
+
"properties": {
27
+
"text": {
28
+
"type": "string",
29
+
"minLength": 1,
30
+
"description": "One of the provided context content"
31
+
}
32
+
}
33
+
},
34
+
"description": "List of provided contexts. Note that the index in this array is important, as the response will refer to it."
35
+
}
36
+
},
37
+
"required": [
38
+
"contexts"
39
+
]
40
+
},
41
+
"output": {
42
+
"type": "object",
43
+
"contentType": "application/json",
44
+
"oneOf": [
45
+
{
46
+
"title": "Query",
47
+
"properties": {
48
+
"response": {
49
+
"type": "array",
50
+
"items": {
51
+
"type": "object",
52
+
"properties": {
53
+
"id": {
54
+
"type": "integer",
55
+
"description": "Index of the context in the request"
56
+
},
57
+
"score": {
58
+
"type": "number",
59
+
"description": "Score of the context under the index."
"description": "Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.\n\n",
6
+
"task": {
7
+
"id": "19606750-23ed-4371-aab2-c20349b53a60",
8
+
"name": "Text Classification",
9
+
"description": "Sentiment analysis or text classification is a common NLP task that classifies a text input into labels or classes."
10
+
},
11
+
"tags": [],
12
+
"properties": [],
13
+
"schema": {
14
+
"input": {
15
+
"type": "object",
16
+
"properties": {
17
+
"query": {
18
+
"type": "string",
19
+
"minLength": 1,
20
+
"description": "A query you wish to perform against the provided contexts."
21
+
},
22
+
"top_k": {
23
+
"type": "integer",
24
+
"default": null,
25
+
"minimum": 1,
26
+
"description": "Number of returned results starting with the best score."
27
+
},
28
+
"contexts": {
29
+
"type": "array",
30
+
"items": {
31
+
"type": "object",
32
+
"properties": {
33
+
"text": {
34
+
"type": "string",
35
+
"minLength": 1,
36
+
"description": "One of the provided context content"
37
+
}
38
+
}
39
+
},
40
+
"description": "List of provided contexts. Note that the index in this array is important, as the response will refer to it."
41
+
}
42
+
},
43
+
"required": [
44
+
"query",
45
+
"contexts"
46
+
]
47
+
},
48
+
"output": {
49
+
"type": "object",
50
+
"contentType": "application/json",
51
+
"properties": {
52
+
"response": {
53
+
"type": "array",
54
+
"items": {
55
+
"type": "object",
56
+
"properties": {
57
+
"id": {
58
+
"type": "integer",
59
+
"description": "Index of the context in the request"
60
+
},
61
+
"score": {
62
+
"type": "number",
63
+
"description": "Score of the context under the index."
"description": "MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai.",
6
+
"task": {
7
+
"id": "b52660a1-9a95-4ab2-8b1d-f232be34604a",
8
+
"name": "Text-to-Speech",
9
+
"description": "Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages."
10
+
},
11
+
"tags": [],
12
+
"properties": [],
13
+
"schema": {
14
+
"input": {
15
+
"type": "object",
16
+
"properties": {
17
+
"prompt": {
18
+
"type": "string",
19
+
"minLength": 1,
20
+
"description": "A text description of the image you want to generate"
21
+
},
22
+
"lang": {
23
+
"type": "string",
24
+
"default": "en",
25
+
"description": "The speech language (e.g., 'en' for English, 'fr' for French). Defaults to 'en' if not specified"
26
+
}
27
+
},
28
+
"required": [
29
+
"prompt"
30
+
]
31
+
},
32
+
"output": {
33
+
"oneOf": [
34
+
{
35
+
"type": "object",
36
+
"contentType": "application/json",
37
+
"properties": {
38
+
"audio": {
39
+
"type": "string",
40
+
"description": "The generated audio in MP3 format, base64-encoded"
41
+
}
42
+
}
43
+
},
44
+
{
45
+
"type": "string",
46
+
"contentType": "audio/mpeg",
47
+
"format": "binary",
48
+
"description": "The generated audio in MP3 format"
0 commit comments