changelog and melotts

mchenco · mchenco · commit 5381c2607745 · 2025-03-17T16:52:31.000-04:00
diff --git a/src/content/changelog/workers-ai/2025-03-17-new-workers-ai-models.mdx b/src/content/changelog/workers-ai/2025-03-17-new-workers-ai-models.mdx
@@ -0,0 +1,15 @@
+---
+title: New models in Workers AI
+description: New text-to-speech, reranker, whisper, embeddings models now available!
+date: 2025-03-17T17:00:00Z
+---
+
+Workers AI is excited to add 4 new models to the catalog, including 2 brand new classes of models with a text-to-speech and reranker model. Introducing:
+- [@cf/baai/bge-m3](/workers-ai/models/bge-m3/) - a multi-lingual embeddings model that supports over 100 languages. It can also simultaneously perform dense retrieval, multi-vector retrieval, and sparse retrieval, with the ability to process inputs of different granularities. 
+- [@cf/baai/bge-reranker-base](/workers-ai/models/bge-reranker-base/) - our first reranker model! Rerankers are a type of text classification model that takes a query and context, and outputs a similarity score between the two. When used in RAG systems, you can use a reranker after the initial vector search to find the most relevant documents to return to a user by reranking the outputs. 
+- [@cf/openai/whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) - a faster, more accurate speech-to-text model. This model was added earlier but is graduating out of beta with pricing included today.
+- [@cf/myshell-ai/melotts](/workers-ai/models/melotts/) - our first text-to-speech model that allows users to generate an MP3 with voice audio from inputted text.
+
+Pricing is available for each of these models on the [Workers AI pricing page](/workers-ai/platform/pricing/).
+
+Try it out and let us know what you think! Stay tuned for more models in the coming days.
diff --git a/src/content/docs/workers-ai/platform/pricing.mdx b/src/content/docs/workers-ai/platform/pricing.mdx
@@ -26,6 +26,10 @@ All limits reset daily at 00:00 UTC. If you exceed any one of the above limits,
 
 Neurons are our way of measuring AI outputs across different models, representing the GPU compute needed to perform your request. Our serverless model allows you to pay only for what you use without having to worry about renting, managing, or scaling GPUs.
 
+:::note
+The Price in Tokens column is equivalent to the Price in Neurons column - the different units are displayed so you can easily compare and understand pricing.
+:::
+
 ## LLM model pricing
 
 | Model                                        | Price in Tokens                                            | Price in Neurons                                                          |
@@ -46,16 +50,20 @@ Neurons are our way of measuring AI outputs across different models, representin
 | @cf/meta/llama-2-7b-chat-fp16                | $0.556 per M input tokens <br/> $6.667 per M output tokens | 50505 neurons per M input tokens <br/> 606061 neurons per M output tokens |
 | @cf/meta/llama-guard-3-8b                    | $0.484 per M input tokens <br/> $0.030 per M output tokens | 44003 neurons per M input tokens <br/> 2730 neurons per M output tokens   |
 
-## Other model pricing
-
+## Embeddings model pricing
 | Model                                 | Price in Tokens                                            | Price in Neurons                                                         |
 | ------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------ |
-| @cf/black-forest-labs/flux-1-schnell  | $0.0000528 per 512x512 tile <br/> $0.0001056 per step      | 4.80 neurons per 512x512 tile <br/> 9.60 neurons per step                |
-| @cf/huggingface/distilbert-sst-2-int8 | $0.026 per M input tokens                                  | 2394 neurons per M input tokens                                          |
 | @cf/baai/bge-small-en-v1.5            | $0.020 per M input tokens                                  | 1841 neurons per M input tokens                                          |
 | @cf/baai/bge-base-en-v1.5             | $0.067 per M input tokens                                  | 6058 neurons per M input tokens                                          |
 | @cf/baai/bge-large-en-v1.5            | $0.204 per M input tokens                                  | 18582 neurons per M input tokens                                         |
 |@cf/baai/bge-m3                  |$0.012 per M input tokens|1075 neurons per M input tokens  |
+
+## Other model pricing
+
+| Model                                 | Price in Tokens                                            | Price in Neurons                                                         |
+| ------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------ |
+| @cf/black-forest-labs/flux-1-schnell  | $0.0000528 per 512x512 tile <br/> $0.0001056 per step      | 4.80 neurons per 512x512 tile <br/> 9.60 neurons per step                |
+| @cf/huggingface/distilbert-sst-2-int8 | $0.026 per M input tokens                                  | 2394 neurons per M input tokens                                          |
 |@cf/baai/bge-reranker-base       |$0.003 per M input tokens|283 neurons per M input tokens   |
 | @cf/meta/m2m100-1.2b                  | $0.342 per M input tokens <br/> $0.342 per M output tokens | 31050 neurons per M input tokens <br/> 31050 neurons per M output tokens |
 | @cf/microsoft/resnet-50               | $2.51 per M images                                         | 228055 neurons per M images                                              |
diff --git a/src/content/workers-ai-models/bge-reranker-base.json b/src/content/workers-ai-models/bge-reranker-base.json
@@ -11,61 +11,61 @@
     "tags": [],
     "properties": [],
     "schema": {
-      "input": {
-        "type": "object",
-        "properties": {
-          "query": {
-            "type": "string",
-            "minLength": 1,
-            "description": "A query you wish to perform against the provided contexts."
-          },
-          "top_k": {
-            "type": "integer",
-            "default": null,
-            "minimum": 1,
-            "description": "Number of returned results starting with the best score."
-          },
-          "contexts": {
-            "type": "array",
-            "items": {
-              "type": "object",
-              "properties": {
-                "text": {
-                  "type": "string",
-                  "minLength": 1,
-                  "description": "One of the provided context content"
+        "input": {
+            "type": "object",
+            "properties": {
+                "query": {
+                    "type": "string",
+                    "minLength": 1,
+                    "description": "A query you wish to perform against the provided contexts."
+                },
+                "top_k": {
+                    "type": "integer",
+                    "default": null,
+                    "minimum": 1,
+                    "description": "Number of returned results starting with the best score."
+                },
+                "contexts": {
+                    "type": "array",
+                    "items": {
+                        "type": "object",
+                        "properties": {
+                            "text": {
+                                "type": "string",
+                                "minLength": 1,
+                                "description": "One of the provided context content"
+                            }
+                        }
+                    },
+                    "description": "List of provided contexts. Note that the index in this array is important, as the response will refer to it."
                 }
-              }
             },
-            "description": "List of provided contexts. Note that the index in this array is important, as the response will refer to it."
-          }
+            "required": [
+                "query",
+                "contexts"
+            ]
         },
-        "required": [
-          "query",
-          "contexts"
-        ]
-      },
-      "output": {
-        "type": "object",
-        "contentType": "application/json",
-        "properties": {
-          "response": {
-            "type": "array",
-            "items": {
-              "type": "object",
-              "properties": {
-                "id": {
-                  "type": "integer",
-                  "description": "Index of the context in the request"
-                },
-                "score": {
-                  "type": "number",
-                  "description": "Score of the context under the index."
+        "output": {
+            "type": "object",
+            "contentType": "application/json",
+            "properties": {
+                "response": {
+                    "type": "array",
+                    "items": {
+                        "type": "object",
+                        "properties": {
+                            "id": {
+                                "type": "integer",
+                                "description": "Index of the context in the request"
+                            },
+                            "score": {
+                                "type": "number",
+                                "description": "Score of the context under the index."
+                            }
+                        }
+                    }
                 }
-              }
             }
-          }
         }
-      }
     }
 }
diff --git a/src/content/workers-ai-models/melotts.json b/src/content/workers-ai-models/melotts.json
@@ -0,0 +1,53 @@
+{
+    "id": "c837b2ac-4d9b-4d37-8811-34de60f0c44f",
+    "source": 1,
+    "name": "@cf/myshell-ai/melotts",
+    "description": "MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai.",
+    "task": {
+        "id": "b52660a1-9a95-4ab2-8b1d-f232be34604a",
+        "name": "Text-to-Speech",
+        "description": "Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages."
+    },
+    "tags": [],
+    "properties": [],
+    "schema": {
+        "input": {
+            "type": "object",
+            "properties": {
+                "prompt": {
+                    "type": "string",
+                    "minLength": 1,
+                    "description": "A text description of the image you want to generate"
+                },
+                "lang": {
+                    "type": "string",
+                    "default": "en",
+                    "description": "The speech language (e.g., 'en' for English, 'fr' for French). Defaults to 'en' if not specified"
+                }
+            },
+            "required": [
+                "prompt"
+            ]
+        },
+        "output": {
+            "oneOf": [
+                {
+                    "type": "object",
+                    "contentType": "application/json",
+                    "properties": {
+                        "audio": {
+                            "type": "string",
+                            "description": "The generated audio in MP3 format, base64-encoded"
+                        }
+                    }
+                },
+                {
+                    "type": "string",
+                    "contentType": "audio/mpeg",
+                    "format": "binary",
+                    "description": "The generated audio in MP3 format"
+                }
+            ]
+        }
+    }
+}