mention the current focus of hf-inference

julien-c · julien-c · commit a1a1f6fa5490 · 2025-07-14T18:10:31.000+02:00
diff --git a/docs/inference-providers/pricing.md b/docs/inference-providers/pricing.md
@@ -79,6 +79,7 @@ As you may have noticed, you can select to work with `"hf-inference"` provider.
 
 For instance, a request to [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012.
 
+As of July 2025, hf-inference focuses mostly on CPU inference (e.g. embedding, text-ranking, text-classification, or smaller LLMs that have historical importance like BERT or GPT-2).
 
 ## Billing for Team and Enterprise organizations
 
diff --git a/scripts/inference-providers/templates/providers/hf-inference.handlebars b/scripts/inference-providers/templates/providers/hf-inference.handlebars
@@ -13,4 +13,6 @@ All supported HF Inference models can be found [here](https://huggingface.co/mod
 HF Inference is the serverless Inference API powered by Hugging Face. This service used to be called "Inference API (serverless)" prior to Inference Providers.
 If you are interested in deploying models to a dedicated and autoscaling infrastructure managed by Hugging Face, check out [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) instead.
 
+As of July 2025, hf-inference focuses mostly on CPU inference (e.g. embedding, text-ranking, text-classification, or smaller LLMs that have historical importance like BERT or GPT-2).
+
 {{{tasksSection}}}

Original file line number	Diff line number	Diff line change
@@ -79,6 +79,7 @@ As you may have noticed, you can select to work with `"hf-inference"` provider.
`79`	`79`
`80`	`80`	`For instance, a request to [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012.`
`81`	`81`
	`82`	`+As of July 2025, hf-inference focuses mostly on CPU inference (e.g. embedding, text-ranking, text-classification, or smaller LLMs that have historical importance like BERT or GPT-2).`
`82`	`83`
`83`	`84`	`## Billing for Team and Enterprise organizations`
`84`	`85`