Skip to content

Commit 9d5e77a

Browse files
authored
[WAI] New models for AI Week (#24720)
* model pages * added pricing to page * changelogs * code example * trailing comma * svg logos * adding pipecat * prettier
1 parent 69a99c1 commit 9d5e77a

File tree

11 files changed

+805
-5
lines changed

11 files changed

+805
-5
lines changed
Lines changed: 1 addition & 0 deletions
Loading

src/assets/images/workers-ai/leonardo.svg

Lines changed: 1 addition & 0 deletions
Loading

src/components/models/data.ts

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ import google from "../../assets/images/workers-ai/google.svg";
88
import deepseek from "../../assets/images/workers-ai/deepseek.svg";
99
import qwen from "../../assets/images/workers-ai/qwen.svg";
1010
import blackforestlabs from "../../assets/images/workers-ai/blackforestlabs.svg";
11+
import deepgram from "../../assets/images/workers-ai/deepgram.svg";
12+
import leonardo from "../../assets/images/workers-ai/leonardo.svg";
1113

1214
export const authorData: Record<string, { name: string; logo: string }> = {
1315
openai: {
@@ -54,4 +56,12 @@ export const authorData: Record<string, { name: string; logo: string }> = {
5456
name: "Black Forest Labs",
5557
logo: blackforestlabs.src,
5658
},
59+
deepgram: {
60+
name: "Deepgram",
61+
logo: deepgram.src,
62+
},
63+
leonardo: {
64+
name: "Leonardo",
65+
logo: leonardo.src,
66+
},
5767
};
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
title: Deepgram and Leonardo partner models now available on Workers AI
3+
description: State-of-the-art TTS, STT and image generation models, hosted on Workers AI infrastructure
4+
products:
5+
- workers-ai
6+
date: 2025-08-27
7+
---
8+
9+
New state-of-the-art models have landed on Workers AI! This time, we're introducing new **partner models** trained by our friends at [Deepgram](https://deepgram.com) and [Leonardo](https://leonardo.ai), hosted on Workers AI infrastructure.
10+
11+
As well, we're introuding a new turn detection model that enables you to detect when someone is done speaking — useful for building voice agents!
12+
13+
Read the [blog](https://blog.cloudflare.com/workers-ai-partner-models) for more details and check out some of the new models on our platform:
14+
- [`@cf/deepgram/aura-1`](/workers-ai/models/aura-1) is a text-to-speech model that allows you to input text and have it come to life in a customizable voice
15+
- [`@cf/deepgram/nova-3`](/workers-ai/models/nova-3) is speech-to-text model that transcribes multilingual audio at a blazingly fast speed
16+
- [`@cf/pipecat-ai/smart-turn-v2`](/workers-ai/models/smart-turn-v2) helps you detect when someone is done speaking
17+
- [`@cf/leonardo/lucid-origin`](/workers-ai/models/lucid-origin) is a text-to-image model that generates images with sharp graphic design, stunning full-HD renders, or highly specific creative direction
18+
- [`@cf/leonardo/phoenix-1.0`](/workers-ai/models/phoenix-1.0) is a text-to-image model with exceptional prompt adherence and coherent text
19+
20+
You can filter out new partner models with the `Partner` capability on our [Models](/workers-ai/models) page.
21+
22+
As well, we're introducing WebSocket support for some of our audio models, which you can filter though the `Realtime` capability on our [Models](/workers-ai/models) page. WebSockets allows you to create a bi-directional connection to our inference server with low latency — perfect for those that are building voice agents.
23+
24+
An example python snippet on how to use WebSockets with our new Aura model:
25+
26+
```
27+
import json
28+
import os
29+
import asyncio
30+
import websockets
31+
32+
uri = f"wss://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/aura-1"
33+
34+
input = [
35+
"Line one, out of three lines that will be provided to the aura model.",
36+
"Line two, out of three lines that will be provided to the aura model.",
37+
"Line three, out of three lines that will be provided to the aura model. This is a last line.",
38+
]
39+
40+
41+
async def text_to_speech():
42+
async with websockets.connect(uri, additional_headers={"Authorization": os.getenv("CF_TOKEN")}) as websocket:
43+
print("connection established")
44+
for line in input:
45+
print(f"sending `{line}`")
46+
await websocket.send(json.dumps({"type": "Speak", "text": line}))
47+
48+
print("line was sent, flushing")
49+
await websocket.send(json.dumps({"type": "Flush"}))
50+
print("flushed, recving")
51+
resp = await websocket.recv()
52+
print(f"response received {resp}")
53+
54+
55+
if __name__ == "__main__":
56+
asyncio.run(text_to_speech())
57+
```

src/content/docs/workers-ai/platform/pricing.mdx

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,15 +66,30 @@ The Price in Tokens column is equivalent to the Price in Neurons column - the di
6666
| @cf/baai/bge-large-en-v1.5 | $0.204 per M input tokens | 18582 neurons per M input tokens |
6767
| @cf/baai/bge-m3 | $0.012 per M input tokens | 1075 neurons per M input tokens |
6868

69-
## Other model pricing
69+
## Image model pricing
7070

7171
| Model | Price in Tokens | Price in Neurons |
7272
| ------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------ |
7373
| @cf/black-forest-labs/flux-1-schnell | $0.0000528 per 512x512 tile <br/> $0.0001056 per step | 4.80 neurons per 512x512 tile <br/> 9.60 neurons per step |
74-
| @cf/huggingface/distilbert-sst-2-int8 | $0.026 per M input tokens | 2394 neurons per M input tokens |
75-
| @cf/baai/bge-reranker-base | $0.003 per M input tokens | 283 neurons per M input tokens |
76-
| @cf/meta/m2m100-1.2b | $0.342 per M input tokens <br/> $0.342 per M output tokens | 31050 neurons per M input tokens <br/> 31050 neurons per M output tokens |
77-
| @cf/microsoft/resnet-50 | $2.51 per M images | 228055 neurons per M images |
74+
| @cf/leonardo/lucid-origin | $0.006996 per 512x512 tile <br/> $0.000132 per step | 636.00 neurons per 512x512 tile <br/> 12.00 neurons per step |
75+
| @cf/leonardo/phoenix-1.0 | $0.005830 per 512x512 tile <br/> $0.000110 per step | 530.00 neurons per 512x512 tile <br/> 10.00 neurons per step |
76+
77+
## Audio model pricing
78+
79+
| Model | Price in Tokens | Price in Neurons |
80+
| ------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------ |
7881
| @cf/openai/whisper | $0.0005 per audio minute | 41.14 neurons per audio minute |
7982
| @cf/openai/whisper-large-v3-turbo | $0.0005 per audio minute | 46.63 neurons per audio minute |
8083
| @cf/myshell-ai/melotts | $0.0002 per audio minute | 18.63 neurons per audio minute |
84+
| @cf/deepgram/aura-1 | $0.015 per 1k characters input <br/> | 1.36 neurons per 1k characters input <br/> |
85+
| @cf/deepgram/nova-3 | $0.0052 per audio minute output <br/> | 7.88 neurons per audio minute output <br/> |
86+
| @cf/pipecat-ai/smart-turn-v2 | $0.00033795 per audio minute input <br/> | 0.51 neurons per audio minute output <br/> |
87+
88+
## Other model pricing
89+
90+
| Model | Price in Tokens | Price in Neurons |
91+
| ------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------ |
92+
| @cf/huggingface/distilbert-sst-2-int8 | $0.026 per M input tokens | 2394 neurons per M input tokens |
93+
| @cf/baai/bge-reranker-base | $0.003 per M input tokens | 283 neurons per M input tokens |
94+
| @cf/meta/m2m100-1.2b | $0.342 per M input tokens <br/> $0.342 per M output tokens | 31050 neurons per M input tokens <br/> 31050 neurons per M output tokens |
95+
| @cf/microsoft/resnet-50 | $2.51 per M images | 228055 neurons per M images |

src/content/release-notes/workers-ai.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,21 @@ link: "/workers-ai/changelog/"
33
productName: Workers AI
44
productLink: "/workers-ai/"
55
entries:
6+
- publish_date: "2025-08-27"
7+
title: Introducing Partner models to the Workers AI catalog
8+
description: |-
9+
- Read the [blog](https://blog.cloudflare.com/workers-ai-partner-models) for more details
10+
- [`@cf/deepgram/aura-1`](/workers-ai/models/aura-1) is a text-to-speech model that allows you to input text and have it come to life in a customizable voice
11+
- [`@cf/deepgram/nova-3`](/workers-ai/models/nova-3) is speech-to-text model that transcribes multilingual audio at a blazingly fast speed
12+
- [`@cf/pipecat-ai/smart-turn-v2`](/workers-ai/models/smart-turn-v2) helps you detect when someone is done speaking
13+
- [`@cf/leonardo/lucid-origin`](/workers-ai/models/lucid-origin) is a text-to-image model that generates images with sharp graphic design, stunning full-HD renders, or highly specific creative direction
14+
- [`@cf/leonardo/phoenix-1.0`](/workers-ai/models/phoenix-1.0) is a text-to-image model with exceptional prompt adherence and coherent text
15+
- WebSocket support added for audio models like `@cf/deepgram/aura-1`, `@cf/deepgram/nova-3`, `@cf/pipecat-ai/smart-turn-v2`
16+
- publish_date: "2025-08-05"
17+
title: Adding gpt-oss models to our catalog
18+
description: |-
19+
- Check out the [blog](https://blog.cloudflare.com/openai-gpt-oss-on-workers-ai) for more details about the new models
20+
- Take a look at the [`gpt-oss-120b`](/workers-ai/models/gpt-oss-120b) and [`gpt-oss-20b`](/workers-ai/models/gpt-oss-20b) model pages for more information about schemas, pricing, and context windows
621
- publish_date: "2025-04-09"
722
title: Pricing correction for @cf/myshell-ai/melotts
823
description: |-
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
{
2+
"id": "1f55679f-009e-4456-aa4f-049a62b4b6a0",
3+
"source": 1,
4+
"name": "@cf/deepgram/aura-1",
5+
"description": "Aura is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.",
6+
"task": {
7+
"id": "b52660a1-9a95-4ab2-8b1d-f232be34604a",
8+
"name": "Text-to-Speech",
9+
"description": "Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages."
10+
},
11+
"created_at": "2025-08-27 01:18:18.880",
12+
"tags": [],
13+
"properties": [
14+
{
15+
"property_id": "async_queue",
16+
"value": "true"
17+
},
18+
{
19+
"property_id": "partner",
20+
"value": "true"
21+
},
22+
{
23+
"property_id": "realtime",
24+
"value": "true"
25+
},
26+
{
27+
"property_id": "price",
28+
"value": [
29+
{
30+
"unit": "per 1k characters",
31+
"price": 0.0150,
32+
"currency": "USD"
33+
}
34+
]
35+
}
36+
],
37+
"schema": {
38+
"input": {
39+
"type": "object",
40+
"properties": {
41+
"speaker": {
42+
"type": "string",
43+
"enum": [
44+
"angus",
45+
"asteria",
46+
"arcas",
47+
"orion",
48+
"orpheus",
49+
"athena",
50+
"luna",
51+
"zeus",
52+
"perseus",
53+
"helios",
54+
"hera",
55+
"stella"
56+
],
57+
"default": "angus",
58+
"description": "Speaker used to produce the audio."
59+
},
60+
"encoding": {
61+
"type": "string",
62+
"enum": [
63+
"linear16",
64+
"flac",
65+
"mulaw",
66+
"alaw",
67+
"mp3",
68+
"opus",
69+
"aac"
70+
],
71+
"description": "Encoding of the output audio."
72+
},
73+
"container": {
74+
"type": "string",
75+
"enum": [
76+
"none",
77+
"wav",
78+
"ogg"
79+
],
80+
"description": "Container specifies the file format wrapper for the output audio. The available options depend on the encoding type.."
81+
},
82+
"text": {
83+
"type": "string",
84+
"description": "The text content to be converted to speech"
85+
},
86+
"sample_rate": {
87+
"type": "number",
88+
"description": "Sample Rate specifies the sample rate for the output audio. Based on the encoding, different sample rates are supported. For some encodings, the sample rate is not configurable"
89+
},
90+
"bit_rate": {
91+
"type": "number",
92+
"description": "The bitrate of the audio in bits per second. Choose from predefined ranges or specific values based on the encoding type."
93+
}
94+
},
95+
"required": [
96+
"text"
97+
]
98+
},
99+
"output": {
100+
"type": "string",
101+
"contentType": "audio/mpeg",
102+
"format": "binary",
103+
"description": "The generated audio in MP3 format"
104+
}
105+
}
106+
}
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
{
2+
"id": "0e372c11-8720-46c9-a02d-666188a22dae",
3+
"source": 1,
4+
"name": "@cf/leonardo/lucid-origin",
5+
"description": "Lucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics – from stylized concept art to crisp product mockups.\n",
6+
"task": {
7+
"id": "3d6e1f35-341b-4915-a6c8-9a7142a9033a",
8+
"name": "Text-to-Image",
9+
"description": "Generates images from input text. These models can be used to generate and modify images based on text prompts."
10+
},
11+
"created_at": "2025-08-25 19:21:28.770",
12+
"tags": [],
13+
"properties": [
14+
{
15+
"property_id": "partner",
16+
"value": "true"
17+
},
18+
{
19+
"property_id": "price",
20+
"value": [
21+
{
22+
"unit": "per 512 by 512 tile",
23+
"price": 0.007,
24+
"currency": "USD"
25+
},
26+
{
27+
"unit": "per step",
28+
"price": 0.00013,
29+
"currency": "USD"
30+
}
31+
]
32+
}
33+
],
34+
"schema": {
35+
"input": {
36+
"type": "object",
37+
"properties": {
38+
"prompt": {
39+
"type": "string",
40+
"minLength": 1,
41+
"description": "A text description of the image you want to generate."
42+
},
43+
"guidance": {
44+
"type": "number",
45+
"default": 4.5,
46+
"minimum": 0,
47+
"maximum": 10,
48+
"description": "Controls how closely the generated image should adhere to the prompt; higher values make the image more aligned with the prompt"
49+
},
50+
"seed": {
51+
"type": "integer",
52+
"minimum": 0,
53+
"description": "Random seed for reproducibility of the image generation"
54+
},
55+
"height": {
56+
"type": "integer",
57+
"minimum": 0,
58+
"maximum": 2500,
59+
"default": 1120,
60+
"description": "The height of the generated image in pixels"
61+
},
62+
"width": {
63+
"type": "integer",
64+
"minimum": 0,
65+
"maximum": 2500,
66+
"default": 1120,
67+
"description": "The width of the generated image in pixels"
68+
},
69+
"num_steps": {
70+
"type": "integer",
71+
"default": 4,
72+
"minimum": 1,
73+
"maximum": 40,
74+
"description": "The number of diffusion steps; higher values can improve quality but take longer"
75+
}
76+
},
77+
"required": [
78+
"prompt"
79+
]
80+
},
81+
"output": {
82+
"type": "object",
83+
"contentType": "application/json",
84+
"properties": {
85+
"image": {
86+
"type": "string",
87+
"description": "The generated image in Base64 format."
88+
}
89+
}
90+
}
91+
}
92+
}

0 commit comments

Comments
 (0)