Skip to content

Commit 1588947

Browse files
mchencomarkdembo
authored andcommitted
Workers AI Birthday Week Updates (#17119)
* pricing and changelog * Fix escaping * Missed one * Fix --------- Co-authored-by: Mark Dembo <[email protected]>
1 parent fa13938 commit 1588947

File tree

2 files changed

+65
-86
lines changed

2 files changed

+65
-86
lines changed

src/content/changelogs/workers-ai.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,18 @@ productLink: "/workers-ai/"
55
productArea: Developer platform
66
productAreaLink: /workers/platform/changelog/platform/
77
entries:
8+
- publish_date: "2024-09-26"
9+
title: Workers AI Birthday Week 2024 announcements
10+
description: |-
11+
- Meta Llama 3.2 1B, 3B, and 11B vision is now available on Workers AI
12+
- `@cf/black-forest-labs/flux-1-schnell` is now available on Workers AI
13+
- Workers AI is fast! Powered by new GPUs and optimizations, you can expect faster inference on Llama 3.1, Llama 3.2, and FLUX models.
14+
- No more neurons. Workers AI is moving towards [unit-based pricing](/workers-ai/platform/pricing)
15+
- Model pages get a refresh with better documentation on parameters, pricing, and model capabilities
16+
- Closed beta for our Run Any* Model feature, [sign up here](https://forms.gle/h7FcaTF4Zo5dzNb68)
17+
- Check out the [product announcements blog post](https://blog.cloudflare.com/workers-ai) for more information
18+
- And the [technical blog post](https://blog.cloudflare.com/workers-ai/making-workers-ai-faster) if you want to learn about how we made Workers AI fast
19+
820
- publish_date: "2024-07-23"
921
title: Meta Llama 3.1 now available on Workers AI
1022
description: |-

src/content/docs/workers-ai/platform/pricing.mdx

Lines changed: 53 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -9,114 +9,81 @@ sidebar:
99
:::note
1010

1111

12-
Workers AI will begin billing for usage on non-beta models after April 1, 2024.
12+
Workers AI has deprecated the usage of neurons in favor of unit-based pricing. The Cloudflare dashboards will be migrated this unit-based pricing soon so you can track your usage. Individual model pages will soon document the price for each model. We also made pricing cheaper!
1313

14+
We will begin billing for all models under this new pricing structure beginning November 1, 2024.
1415

15-
:::
16-
17-
Workers AI is included in both the [Free and Paid Workers plans](/workers/platform/pricing/) and is priced at **$0.011 / 1,000 Regular Twitch Neurons** (also known as Neurons).
18-
19-
Our free allocation allows anyone to use a total of **10,000 Neurons per day at no charge on our [non-beta models](#non-beta-models)**. You can still enjoy unlimited usage on the beta models in the catalog until they graduate out of beta.
20-
21-
To use more than 10,000 Neurons per day for non-beta models, you need to sign up for the [Workers Paid plan](/workers/platform/pricing/#workers). On Workers Paid, you will be charged at $0.011 / 1,000 Neurons for any usage above the free allocation of 10,000 Neurons per day for the non-beta models.
22-
23-
You can monitor your Neuron usage in the [Cloudflare Workers AI dashboard](https://dash.cloudflare.com/?to=/:account/ai/workers-ai). To estimate Neurons and costs, use the [pricing calculator](https://ai.cloudflare.com/#pricing-calculator).
24-
25-
| | Free <br/> allocation | Overage<br/>pricing |
26-
| ------------ | ---------------------- | ----------------------------- |
27-
| Workers Free | 10,000 Neurons per day | N/A - Upgrade to Workers Paid |
28-
| Workers Paid | 10,000 Neurons per day | $0.011 / 1,000 Neurons |
29-
30-
All limits reset daily at 00:00 UTC. If you exceed any one of the above limits, further operations will fail with an error.
31-
32-
## What are Neurons?
33-
34-
Neurons are our way of measuring AI outputs across different models. To give you a sense of what you can accomplish with 10,000 Neurons, you can: generate 100-200 LLM responses, 500 translations, 500 seconds of speech-to-text audio, 10,000 text classifications, or 1,500 - 15,000 embeddings depending on which models you use. Our serverless model allows you to pay only for what you use without having to worry about renting, managing, or scaling GPUs.
3516

36-
To estimate how many Neurons your requests will consume, use the [pricing calculator](https://ai.cloudflare.com/#pricing-calculator).
37-
38-
![Workers AI Pricing Calculator](~/assets/images/workers-ai/pricing-calculator.png)
39-
40-
## Non-beta models
41-
42-
Beginning April 1, 2024, Cloudflare will begin charging $0.011/1,000 Neurons for all usage exceeding 10,000 Neurons per day for the following models:
43-
44-
* [bge-small-en-v1.5](/workers-ai/models/bge-small-en-v1.5/)
45-
* [bge-base-en-v1.5](/workers-ai/models/bge-base-en-v1.5/)
46-
* [bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/)
47-
* [distilbert-sst-2-int8](/workers-ai/models/distilbert-sst-2-int8/)
48-
* [llama-2-7b-chat-int8](/workers-ai/models/llama-2-7b-chat-int8/)
49-
* [llama-2-7b-chat-fp16](/workers-ai/models/llama-2-7b-chat-fp16/)
50-
* [mistral-7b-instruct-v0.1](/workers-ai/models/mistral-7b-instruct-v0.1/)
51-
* [m2m100-1.2b](/workers-ai/models/m2m100-1.2b/)
52-
* [resnet-50](/workers-ai/models/resnet-50/)
53-
* [whisper](/workers-ai/models/whisper/)
54-
55-
Cloudflare will continue to add Neuron calculations for the other models in the catalog and graduate them out of beta in the future.
56-
57-
## Pricing comparison
17+
:::
5818

59-
Cloudflare uses Neurons to measure and bill for inference on Workers AI. This may differ from the input-based pricing you might see from other providers. We’ve prepared the below tables to help you understand and evaluate the estimated cost of Neurons and usage on Workers AI compared with the inputs used for the models available in our catalog.
19+
Workers AI is included in both the [Free and Paid Workers plans](/workers/platform/pricing/) and is priced based on model task, model size, and units.
6020

61-
**Please note that the below is provided for informational purposes only.** All conversions are based on Cloudflare’s public fees as of March 1, 2024, and do not include taxes and any other fees.
21+
Individual model pages will have the pricing listed on them, but the general pricing structure across our models is laid out below.
6222

63-
### Automatic Speech Recognition
23+
These docs will be updated as we add new pricing for new task types in our model catalog.
6424

65-
| Model | Price per <br/> minute of audio |
66-
| --------- | ------------------------------- |
67-
| `whisper` | $0.0022 |
25+
## Pricing Structure
6826

69-
### Image Classification
27+
### Text Generation LLMs (incl Vision models)
28+
Model size is measured in parameters.
29+
Pricing is based on blended tokens (input + output).
30+
Vision models will convert the image input into tokens for billing. Depending on size an aspect ratio, images will be charged for between 1,601 and 6,404 tokens. Most images that are more that 224 pixels wide or tall will be charged as 6,404 tokens each.
7031

71-
| Model | Price per image |
72-
| ----------- | --------------- |
73-
| `Resnet-50` | $0.0000025 |
32+
| Model Size | Pricing |
33+
| ---------------- | ------------------------ |
34+
| \<= 3B | $0.10 per Million Tokens |
35+
| 3.1B - 8B | $0.15 per Million Tokens |
36+
| 8.1B - 20B | $0.20 per Million Tokens |
37+
| 20.1B - 40B | $0.50 per Million Tokens |
38+
| 40.1B+ | $0.75 per Million Tokens |
7439

75-
### Text Classification
40+
### Embeddings
41+
Model size is measured in parameters.
42+
Pricing is based on input tokens.
7643

77-
| Model | Price per 1M <br/> input tokens |
78-
| ----------------------- | ------------------------------- |
79-
| `distilbert-sst-2-int8` | $0.33 |
44+
| Model Size | Pricing |
45+
| ------------------ | ------------------------- |
46+
| \<= 150M parameters | $0.008 per Million Tokens |
47+
| 151M+ parameters | $0.015 per Million Tokens |
8048

81-
### Text Embeddings
49+
## Image Generation
50+
Standard models are large image models such as `@cf/stabilityai/stable-diffusion-xl-base-1.0`
51+
Fast models are usually smaller image models that require fewer steps to generate an image, such as `@cf/black-forest-labs/flux-1-schnell` and `@cf/bytedance/stable-diffusion-xl-lightning`
52+
We take the maximum of the image height and width to calculate pricing. For example, and image of 1024x768 would fall under 1024x1024 pricing.
8253

83-
| Model | Price per 1M <br/> input tokens |
84-
| ------------------- | ------------------------------- |
85-
| `bge-small-en-v1.5` | $0.003 |
86-
| `bge-base-en-v1.5` | $0.014 |
87-
| `bge-large-en-v1.5` | $0.022 |
54+
| Image Size | Model Type | Price |
55+
| ----------- | ---------- | --------------------- |
56+
| \<=256x256 | Standard | $0.00125 per 25 steps |
57+
| \<=256x256 | Fast | $0.00025 per 5 steps |
58+
| \<=512x512 | Standard | $0.0025 per 25 steps |
59+
| \<=512x512 | Fast | $0.0005 per 5 steps |
60+
| \<=1024x1024 | Standard | $0.005 per 25 steps |
61+
| \<=1024x1024 | Fast | $0.001 per 5 steps |
62+
| \<=2048x2048 | Standard | $0.01 per 25 steps |
63+
| \<=2048x2048 | Fast | $0.002 per 5 steps |
8864

89-
### Text Generation
65+
## Speech-to-text
66+
Speech to text models like `@cf/openai/whisper` is billed on minutes of audio input.
9067

91-
On April 2, 2024, we updated pricing for our `mistral-7b-instruct` models to be 17x cheaper and `llama-2-7b-chat-int8` to be 7x cheaper. The pricing table below reflects the new pricing, but you can take a look at the [archived pricing](/workers-ai/platform/pricing/#archived-pricing) to see how pricing has changed.
68+
| Price |
69+
| $0.0039 per minute of audio input|
9270

93-
| Model | Price per 1M <br/> input tokens | Price per 1M <br/> output tokens |
94-
| ---------------------- | ------------------------------- | -------------------------------- |
95-
| `llama-2-7b-chat-fp16` | $0.56 | $6.66 |
96-
| `llama-2-7b-chat-int8` | $0.16 | $0.24 |
97-
| `mistral-7b-instruct` | $0.11 | $0.19 |
9871

99-
### Translation
72+
## Free Allocation
10073

101-
| Model | Price per 1M <br/> input tokens | Price per 1M <br/> output tokens |
102-
| ------------- | ------------------------------- | -------------------------------- |
103-
| `m2m100-1.2b` | $0.13 | $0.70 |
74+
Our free allocation allows anyone to use Workers AI up to a certain limit per day. To use more than the free allocation, upgrade to the Workers Paid plan, where you will be charged on any usage above the free tier based on the pricing structure above.
10475

105-
## Pricing Example
10676

107-
All users receive free allocation of 10k Neurons a day (totaling to 300k Neurons a month).
77+
| Model | Free tier size |
78+
| --------------------- | -------------------------------------------- |
79+
| Text Generation - LLM | 10,000 tokens a day across any model size |
80+
| Embeddings | 10,000 tokens a day across any model size |
81+
| Images | Sum of 250 steps, up to 1024x1024 resolution |
82+
| Speech-to-text | 10 minutes of audio a day |
10883

109-
If a user uses 50k Neurons per day, every day of the month, the Workers AI usage charge will be $13.20.
84+
All limits reset daily at 00:00 UTC. If you exceed any one of the above limits, further operations will fail with an error.
11085

111-
`(50k Neurons - 10k included daily Neurons) * 30 days * $0.011 / 1k Neurons = $13.20`
11286

11387
## Archived Pricing
11488

115-
As we find optimizations for our inference platform, we pass on these optimizations to our customers. You can refer to the archived pricing below to see how pricing has changed.
116-
117-
Before April 2, 2024:
118-
119-
| Model | Price per 1M <br/> input tokens | Price per 1M <br/> output tokens |
120-
| ---------------------- | ------------------------------- | -------------------------------- |
121-
| `llama-2-7b-chat-int8` | $0.28 | $1.72 |
122-
| `mistral-7b-instruct` | $0.28 | $3.33 |
89+
Workers AI was previously metered by Neurons. We deprecated this in favor of unit-based pricing on September 26, 2024. We wanted to make it simple for people to compare and contrast Workers AI with other providers, and we also generally updated pricing to be cheaper with these new units.

0 commit comments

Comments
 (0)