WAI pricing update (#20118)

mchenco · rita3ko · web-flow · commit 66ceb1ddacb2 · 2025-02-20T17:04:29.000-05:00
* update

* fix table breaks

* wording

* blog wording

* changelog + verbiage

---------

Co-authored-by: Rita Kozlov &lt;2414910+rita3ko@users.noreply.github.com&gt;
diff --git a/src/content/changelog/workers-ai/2025-02-20-updated-pricing-docs.mdx b/src/content/changelog/workers-ai/2025-02-20-updated-pricing-docs.mdx
@@ -0,0 +1,11 @@
+---
+title: Workers AI updated pricing
+description: Granular pricing in units and neurons
+date: 2025-02-20T11:00:00Z
+---
+ 
+Updating Workers AI pricing page to reflect the latest models and pricing. Pricing is presented in units (tokens, audio seconds, etc) but will continue to be charged in neurons. The price per neuron remains the same as it has always has been at $0.011 per 1000 neurons. 
+
+Having per-model pricing (instead of buckets) allows us to be more flexible on what models are charged. As we optimize each model, we can then pass on savings for that model.
+
+Going forward, models will be launched in GA with pricing. Dashboard changes are coming to reflect usage in units and neurons. Docs redesign is incoming to show pricing directly on the respective model pages.
diff --git a/src/content/docs/workers-ai/platform/pricing.mdx b/src/content/docs/workers-ai/platform/pricing.mdx
@@ -6,81 +6,52 @@ sidebar:
 ---
 
 :::note
-
-Workers AI has deprecated the usage of neurons in favor of unit-based pricing. The Cloudflare dashboards will be migrated this unit-based pricing soon so you can track your usage. Individual model pages will soon document the price for each model. We also made pricing cheaper!
-
-We will begin billing for all models under this new pricing structure beginning November 1, 2024.
-
+Workers AI has updated pricing to be more granular, with per-model unit-based pricing presented to customers, but still billing in neurons in the back end.
 :::
 
-Workers AI is included in both the [Free and Paid Workers plans](/workers/platform/pricing/) and is priced based on model task, model size, and units.
-
-Individual model pages will have the pricing listed on them, but the general pricing structure across our models is laid out below.
-
-These docs will be updated as we add new pricing for new task types in our model catalog.
-
-## Pricing Structure
-
-Some models may have specific pricing. For specific details, check the page of the [specific model](/workers-ai/models/).
-
-### Text Generation LLMs (incl Vision models)
-
-Model size is measured in parameters.
-Pricing is based on blended tokens (input + output).
-Vision models will convert the image input into tokens for billing. Depending on size and aspect ratio, images will be charged for between 1,601 and 6,404 tokens. Most images that are more that 224 pixels wide or tall will be charged as 6,404 tokens each.
-
-| Model Size  | Pricing                  |
-| ----------- | ------------------------ |
-| \<= 3B      | $0.10 per Million Tokens |
-| 3.1B - 8B   | $0.15 per Million Tokens |
-| 8.1B - 20B  | $0.20 per Million Tokens |
-| 20.1B - 40B | $0.50 per Million Tokens |
-| 40.1B+      | $0.75 per Million Tokens |
-
-### Embeddings
+Workers AI is included in both the [Free and Paid Workers plans](/workers/platform/pricing/) and is priced at **$0.011 per 1,000 Neurons**.
 
-Model size is measured in parameters.
-Pricing is based on input tokens.
+Our free allocation allows anyone to use a total of **10,000 Neurons per day at no charge**. To use more than 10,000 Neurons per day, you need to sign up for the [Workers Paid plan](/workers/platform/pricing/#workers). On Workers Paid, you will be charged at $0.011 / 1,000 Neurons for any usage above the free allocation of 10,000 Neurons per day.
 
-| Model Size          | Pricing                   |
-| ------------------- | ------------------------- |
-| \<= 150M parameters | $0.008 per Million Tokens |
-| 151M+ parameters    | $0.015 per Million Tokens |
-
-## Image Generation
-
-Standard models are large image models such as `@cf/stabilityai/stable-diffusion-xl-base-1.0`
-Fast models are usually smaller image models that require fewer steps to generate an image, such as `@cf/black-forest-labs/flux-1-schnell` and `@cf/bytedance/stable-diffusion-xl-lightning`
-We take the maximum of the image height and width to calculate pricing. For example, an image of 1024x768 would fall under 1024x1024 pricing.
-
-| Image Size   | Price                |
-| ------------ | -------------------- |
-| \<=256x256   | $0.00025 per 5 steps |
-| \<=512x512   | $0.0005 per 5 steps  |
-| \<=1024x1024 | $0.001 per 5 steps   |
-| \<=2048x2048 | $0.002 per 5 steps   |
-
-## Speech-to-text
-
-Speech-to-text models like `@cf/openai/whisper` are billed on minutes of audio input.
-
-| Price                             |
-| --------------------------------- |
-| $0.0039 per minute of audio input |
-
-## Free Allocation
-
-Our free allocation allows anyone to use Workers AI up to a certain limit per day. To use more than the free allocation, upgrade to the Workers Paid plan, where you will be charged on any usage above the free tier based on the pricing structure above.
-
-| Model                 | Free tier size                               |
-| --------------------- | -------------------------------------------- |
-| Text Generation - LLM | 10,000 tokens a day across any model size    |
-| Embeddings            | 10,000 tokens a day across any model size    |
-| Images                | Sum of 250 steps, up to 1024x1024 resolution |
-| Speech-to-text        | 10 minutes of audio a day                    |
+You can monitor your Neuron usage in the [Cloudflare Workers AI dashboard](https://dash.cloudflare.com/?to=/:account/ai/workers-ai).
 
 All limits reset daily at 00:00 UTC. If you exceed any one of the above limits, further operations will fail with an error.
 
-## Archived Pricing
-
-Workers AI was previously metered by Neurons. We deprecated this in favor of unit-based pricing on September 26, 2024. We wanted to make it simple for people to compare and contrast Workers AI with other providers, and we also generally updated pricing to be cheaper with these new units.
+|              | Free <br/> allocation  | Overage<br/>pricing           |
+| ------------ | ---------------------- | ----------------------------- |
+| Workers Free | 10,000 Neurons per day | N/A - Upgrade to Workers Paid |
+| Workers Paid | 10,000 Neurons per day | $0.011 / 1,000 Neurons        |
+
+## What are Neurons?
+
+Neurons are our way of measuring AI outputs across different models, representing the GPU compute needed to perform your request. Our serverless model allows you to pay only for what you use without having to worry about renting, managing, or scaling GPUs.
+
+## LLM model pricing
+| Model                                        | Price in Tokens                                            | Price in Neurons                                                          |
+| -------------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------- |
+| @cf/meta/llama-3.2-1b-instruct               | $0.027 per M input tokens <br/> $0.201 per M output tokens | 2457 neurons per M input tokens <br/> 18252 neurons per M output tokens   |
+| @cf/meta/llama-3.2-3b-instruct               | $0.051 per M input tokens <br/> $0.335 per M output tokens | 4625 neurons per M input tokens <br/> 30475 neurons per M output tokens   |
+| @cf/meta/llama-3.1-8b-instruct-fp8-fast      | $0.045 per M input tokens <br/> $0.384 per M output tokens | 4119 neurons per M input tokens <br/> 34868 neurons per M output tokens   |
+| @cf/meta/llama-3.2-11b-vision-instruct       | $0.049 per M input tokens <br/> $0.676 per M output tokens | 4410 neurons per M input tokens <br/> 61493 neurons per M output tokens   |
+| @cf/meta/llama-3.1-70b-instruct-fp8-fast     | $0.293 per M input tokens <br/> $2.253 per M output tokens | 26668 neurons per M input tokens <br/> 204805 neurons per M output tokens |
+| @cf/meta/llama-3.3-70b-instruct-fp8-fast     | $0.293 per M input tokens <br/> $2.253 per M output tokens | 26668 neurons per M input tokens <br/> 204805 neurons per M output tokens |
+| @cf/deepseek-ai/deepseek-r1-distill-qwen-32b | $0.497 per M input tokens <br/> $4.881 per M output tokens | 45170 neurons per M input tokens <br/> 443756 neurons per M output tokens |
+| @cf/mistral/mistral-7b-instruct-v0.1         | $0.110 per M input tokens <br/> $0.190 per M output tokens | 10000 neurons per M input tokens <br/> 17300 neurons per M output tokens  |
+| @cf/meta/llama-3.1-8b-instruct               | $0.282 per M input tokens <br/> $0.827 per M output tokens | 25608 neurons per M input tokens <br/> 75147 neurons per M output tokens  |
+| @cf/meta/llama-3.1-8b-instruct-fp8           | $0.152 per M input tokens <br/> $0.287 per M output tokens | 13778 neurons per M input tokens <br/> 26128 neurons per M output tokens  |
+| @cf/meta/llama-3.1-8b-instruct-awq           | $0.123 per M input tokens <br/> $0.266 per M output tokens | 11161 neurons per M input tokens <br/> 24215 neurons per M output tokens  |
+| @cf/meta/llama-3-8b-instruct                 | $0.282 per M input tokens <br/> $0.827 per M output tokens | 25608 neurons per M input tokens <br/> 75147 neurons per M output tokens  |
+| @cf/meta/llama-3-8b-instruct-awq             | $0.123 per M input tokens <br/> $0.266 per M output tokens | 11161 neurons per M input tokens <br/> 24215 neurons per M output tokens  |
+| @cf/meta/llama-2-7b-chat-fp16                | $0.556 per M input tokens <br/> $6.667 per M output tokens | 50505 neurons per M input tokens <br/> 606061 neurons per M output tokens |
+
+## Other model pricing
+| Model                                 | Price in Tokens                                            | Price in Neurons                                                         |
+| ------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------ |
+| @cf/black-forest-labs/flux-1-schnell  | $0.0000528 per 512x512 tile <br/> $0.0001056 per step      | 4.80 neurons per 512x512 tile <br/> 9.60 neurons per step                |
+| @cf/huggingface/distilbert-sst-2-int8 | $0.026 per M input tokens                                  | 2394 neurons per M input tokens                                          |
+| @cf/baai/bge-small-en-v1.5            | $0.020 per M input tokens                                  | 1841 neurons per M input tokens                                          |
+| @cf/baai/bge-base-en-v1.5             | $0.067 per M input tokens                                  | 6058 neurons per M input tokens                                          |
+| @cf/baai/bge-large-en-v1.5            | $0.204 per M input tokens                                  | 18582 neurons per M input tokens                                         |
+| @cf/meta/m2m100-1.2b                  | $0.342 per M input tokens <br/> $0.342 per M output tokens | 31050 neurons per M input tokens <br/> 31050 neurons per M output tokens |
+| @cf/microsoft/resnet-50               | $2.509 per image                                           | 00.23 neurons per image                                                  |
+| @cf/openai/whisper                    | $0.0005 per audio minute                                   | 41.14 neurons per audio minute                                           |