From 4f303369a8ebe3223380196ca25a21cef3c48fd0 Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Mon, 3 Feb 2025 19:04:28 +0100 Subject: [PATCH 1/5] quickfix re. pricing system --- docs/api-inference/getting-started.md | 2 +- docs/api-inference/index.md | 4 ++-- docs/api-inference/rate-limits.md | 19 +++++++++++++------ docs/api-inference/supported-models.md | 4 ++-- 4 files changed, 18 insertions(+), 11 deletions(-) diff --git a/docs/api-inference/getting-started.md b/docs/api-inference/getting-started.md index a13474783..fd13c2453 100644 --- a/docs/api-inference/getting-started.md +++ b/docs/api-inference/getting-started.md @@ -1,6 +1,6 @@ # Getting Started -The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) to make it even easier. +The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) and JavaScript SDK (`huggingface.js`) to make it even easier. We'll do a minimal example using a [sentiment classification model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest). Please visit task-specific parameters and further documentation in our [API Reference](./parameters). diff --git a/docs/api-inference/index.md b/docs/api-inference/index.md index eaca18f5d..a68c5324a 100644 --- a/docs/api-inference/index.md +++ b/docs/api-inference/index.md @@ -8,14 +8,14 @@ Explore the most popular models for text, image, speech, and more — all with a ## Why use the Inference API? -The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains: +The Serverless Inference API offers a fast and simple way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains: * **Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses. * **Image Generation:** Easily create customized images, including LoRAs for your own styles. * **Document Embeddings:** Build search and retrieval systems with SOTA embeddings. * **Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more. -⚡ **Fast and Free to Get Started**: The Inference API is free with higher rate limits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more. +⚡ **Fast and Free to Get Started**: The Inference API is free to try out and offers additional included credits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more. --- diff --git a/docs/api-inference/rate-limits.md b/docs/api-inference/rate-limits.md index f3f819d9b..c65ec3dd7 100644 --- a/docs/api-inference/rate-limits.md +++ b/docs/api-inference/rate-limits.md @@ -1,13 +1,20 @@ # Rate Limits -The Inference API has rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based. +As a HF user, you get monthly credits to run the HF Inference API. The amount of credits you get depends on your type of account (Free or PRO or Enterprise Hub), see table below. +You get charged for every inference request, based on the compute time x price of the underlying hardware. -Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to have dedicated resources. +For instance, a request to [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012. + +When your monthly included credits are depleted: +- if you're a Free user, you won't be able to query the Inference API anymore, +- if you're a PRO or Enterprise Hub user, you will get charged for the requests on top of your subscription. You can monitor your spending on your billing page. + +Note that serverless API is not meant to be used for heavy production applications. If you need to handle large numbers of requests, consider [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to have dedicated resources. You need to be authenticated (passing a token or through your browser) to use the Inference API. -| User Tier | Rate Limit | -|---------------------|---------------------------| -| Signed-up Users | 1,000 requests per day | -| PRO and Enterprise Users | 20,000 requests per day | \ No newline at end of file +| User Tier | Included monthly credits | +|---------------------------|------------------------------------| +| Free Users | subject to change, less than $0.10 | +| PRO and Enterprise Users | $2.00 | \ No newline at end of file diff --git a/docs/api-inference/supported-models.md b/docs/api-inference/supported-models.md index 8edaa2215..6b720c4c5 100644 --- a/docs/api-inference/supported-models.md +++ b/docs/api-inference/supported-models.md @@ -10,7 +10,7 @@ You can find: ## What do I get with a PRO subscription? -In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [rate limits](./rate-limits) and free access to the following models: +In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [included credits](./rate-limits) and access to the following models: @@ -27,4 +27,4 @@ This list is not exhaustive and might be updated in the future. ## Running Private Models -The free Serverless API is designed to run popular public models. If you have a private model, you can use [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to deploy it. +The Serverless API is designed to run popular public models. If you have a private model, you can use [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to deploy it. From 6ef710eb7aa96ed6cc9c21bb35a29509fac2ed79 Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Tue, 4 Feb 2025 13:23:25 +0100 Subject: [PATCH 2/5] rename doc page --- docs/api-inference/_redirects.yml | 1 + docs/api-inference/_toctree.yml | 4 ++-- docs/api-inference/{rate-limits.md => pricing.md} | 0 docs/api-inference/supported-models.md | 2 +- 4 files changed, 4 insertions(+), 3 deletions(-) rename docs/api-inference/{rate-limits.md => pricing.md} (100%) diff --git a/docs/api-inference/_redirects.yml b/docs/api-inference/_redirects.yml index aab354ba5..74f738f5f 100644 --- a/docs/api-inference/_redirects.yml +++ b/docs/api-inference/_redirects.yml @@ -3,3 +3,4 @@ detailed_parameters: parameters parallelism: getting_started usage: getting_started faq: index +rate-limits: pricing diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml index f19c04503..91a936748 100644 --- a/docs/api-inference/_toctree.yml +++ b/docs/api-inference/_toctree.yml @@ -5,8 +5,8 @@ title: Getting Started - local: supported-models title: Supported Models - - local: rate-limits - title: Rate Limits + - local: pricing + title: Pricing and Rate limits - local: security title: Security title: Getting Started diff --git a/docs/api-inference/rate-limits.md b/docs/api-inference/pricing.md similarity index 100% rename from docs/api-inference/rate-limits.md rename to docs/api-inference/pricing.md diff --git a/docs/api-inference/supported-models.md b/docs/api-inference/supported-models.md index 6b720c4c5..e58a11778 100644 --- a/docs/api-inference/supported-models.md +++ b/docs/api-inference/supported-models.md @@ -10,7 +10,7 @@ You can find: ## What do I get with a PRO subscription? -In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [included credits](./rate-limits) and access to the following models: +In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [included credits](./pricing) and access to the following models: From b3087901a5944e7b08cf0cdc255e8744f5f4fc71 Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Tue, 4 Feb 2025 14:22:47 +0100 Subject: [PATCH 3/5] Update docs/api-inference/pricing.md Co-authored-by: Lucain --- docs/api-inference/pricing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/api-inference/pricing.md b/docs/api-inference/pricing.md index c65ec3dd7..67d177af4 100644 --- a/docs/api-inference/pricing.md +++ b/docs/api-inference/pricing.md @@ -1,4 +1,4 @@ -# Rate Limits +# Pricing and Rate limits As a HF user, you get monthly credits to run the HF Inference API. The amount of credits you get depends on your type of account (Free or PRO or Enterprise Hub), see table below. You get charged for every inference request, based on the compute time x price of the underlying hardware. From 6a017f5310661108bfb3be0fbbd73a6cb99a4c05 Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Tue, 4 Feb 2025 14:23:01 +0100 Subject: [PATCH 4/5] Update docs/api-inference/pricing.md Co-authored-by: vb --- docs/api-inference/pricing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/api-inference/pricing.md b/docs/api-inference/pricing.md index 67d177af4..abaa98380 100644 --- a/docs/api-inference/pricing.md +++ b/docs/api-inference/pricing.md @@ -3,7 +3,7 @@ As a HF user, you get monthly credits to run the HF Inference API. The amount of credits you get depends on your type of account (Free or PRO or Enterprise Hub), see table below. You get charged for every inference request, based on the compute time x price of the underlying hardware. -For instance, a request to [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012. +For instance, a request to [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012. When your monthly included credits are depleted: - if you're a Free user, you won't be able to query the Inference API anymore, From 770f9ef1f4949c573308c419c6ac35055a8d2d2a Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Tue, 4 Feb 2025 14:23:28 +0100 Subject: [PATCH 5/5] Update docs/api-inference/index.md Co-authored-by: vb --- docs/api-inference/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/api-inference/index.md b/docs/api-inference/index.md index a68c5324a..69926ce63 100644 --- a/docs/api-inference/index.md +++ b/docs/api-inference/index.md @@ -15,7 +15,7 @@ The Serverless Inference API offers a fast and simple way to explore thousands o * **Document Embeddings:** Build search and retrieval systems with SOTA embeddings. * **Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more. -⚡ **Fast and Free to Get Started**: The Inference API is free to try out and offers additional included credits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more. +⚡ **Fast and Free to Get Started**: The Inference API is free to try out and comes with additional included credits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more. ---