Some tweaks in wording between inference providers / hf inference API (#1689)

Wauplin · julien-c · web-flow · commit 469703643924 · 2025-04-10T12:26:54.000+02:00
* Some tweaks in wording between inference providers / hf inference API

* add link

* Update docs/hub/models-inference.md

---------

Co-authored-by: Julien Chaumond &lt;julien@huggingface.co&gt;
diff --git a/docs/hub/academia-hub.md b/docs/hub/academia-hub.md
@@ -25,7 +25,6 @@ Key Features of Academia Hub:
 - **Spaces Hosting:** Create ZeroGPU Spaces with A100 hardware.
 - **Spaces Dev Mode:** Fast iterations via SSH/VS Code for Spaces.
 - **Inference Providers:** Get monthly included credits across all Inference Providers.
-- **HF Inference API:** Get x20 higher rate limits on HF Serverless API.
 - **Dataset Viewer:** Activate it on private datasets.
 - **Blog Articles:** Publish articles to the Hugging Face blog.
 - **Social Posts:** Share short updates with the community.
diff --git a/docs/hub/index.md b/docs/hub/index.md
@@ -110,7 +110,7 @@ The Hub offers **versioning, commit history, diffs, branches, and over a dozen l
 
 ## Models
 
-You can discover and use dozens of thousands of open-source ML models shared by the community. To promote responsible model usage and development, model repos are equipped with [Model Cards](./model-cards) to inform users of each model's limitations and biases. Additional [metadata](./model-cards#model-card-metadata) about info such as their tasks, languages, and evaluation results can be included, with training metrics charts even added if the repository contains [TensorBoard traces](./tensorboard). It's also easy to add an [**inference widget**](./models-widgets) to your model, allowing anyone to play with the model directly in the browser! For programmatic access, a serverless API is provided to [**instantly serve your model**](./models-inference).
+You can discover and use dozens of thousands of open-source ML models shared by the community. To promote responsible model usage and development, model repos are equipped with [Model Cards](./model-cards) to inform users of each model's limitations and biases. Additional [metadata](./model-cards#model-card-metadata) about info such as their tasks, languages, and evaluation results can be included, with training metrics charts even added if the repository contains [TensorBoard traces](./tensorboard). It's also easy to add an [**inference widget**](./models-widgets) to your model, allowing anyone to play with the model directly in the browser! For programmatic access, a serverless API is provided by [**Inference Providers**](./models-inference).
 
 To upload models to the Hub, or download models and integrate them into your work, explore the [**Models documentation**](./models). You can also choose from [**over a dozen libraries**](./models-libraries) such as 🤗 Transformers, Asteroid, and ESPnet that support the Hub.
 
diff --git a/docs/hub/models-inference.md b/docs/hub/models-inference.md
@@ -2,35 +2,29 @@
 
 Please refer to the [Inference Providers Documentation](https://huggingface.co/docs/inference-providers) for detailed information.
 
+## What is HF-Inference API?
 
-## What technology do you use to power the HF-Inference API?
-
-For 🤗 Transformers models, [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) power the API.
+HF-Inference API is one of the many providers available on the Hugging Face Hub.
+It is deployed by Hugging Face ourselves, using text-generation-inference for LLMs for instance. This service used to be called “Inference API (serverless)” prior to Inference Providers. 
 
-On top of `Pipelines` and depending on the model type, there are several production optimizations like:
-- compiling models to optimized intermediary representations (e.g. [ONNX](https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333)),
-- maintaining a Least Recently Used cache, ensuring that the most popular models are always loaded,
-- scaling the underlying compute infrastructure on the fly depending on the load constraints.
+For more details about the HF-Inference API, check out it's [dedicated page](https://huggingface.co/docs/inference-providers/providers/hf-inference).
 
-For models from [other libraries](./models-libraries), the API uses [Starlette](https://www.starlette.io) and runs in [Docker containers](https://github.com/huggingface/api-inference-community/tree/main/docker_images). Each library defines the implementation of [different pipelines](https://github.com/huggingface/api-inference-community/tree/main/docker_images/sentence_transformers/app/pipelines).
-
-## How can I turn off the HF-Inference API for my model?
+## What technology do you use to power the HF-Inference API?
 
-Specify `inference: false` in your model card's metadata.
+The HF-Inference API is powered by [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) under the hood.
 
 ## Why don't I see an inference widget, or why can't I use the API?
 
-For some tasks, there might not be support in the HF-Inference API, and, hence, there is no widget.
-For all libraries (except 🤗 Transformers), there is a [library-to-tasks.ts file](https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/library-to-tasks.ts) of supported tasks in the API. When a model repository has a task that is not supported by the repository library, the repository has `inference: false` by default.
-
-## Can I send large volumes of requests? Can I get accelerated APIs?
-
-If you are interested in accelerated inference, higher volumes of requests, or an SLA, please contact us at `api-enterprise at huggingface.co`.
+For some tasks, there might not be support by any Inference Provider, and hence, there is no widget.
 
 ## How can I see my usage?
 
-You can check your usage in the [Inference Dashboard](https://ui.endpoints.huggingface.co/endpoints). The dashboard shows both your serverless and dedicated endpoints usage.
+To check usage across all providers, check out your [billing page](https://huggingface.co/settings/billing).
+
+To check your HF-Inference usage specifically, check out the [Inference Dashboard](https://ui.endpoints.huggingface.co/endpoints). The dashboard shows both your serverless and dedicated endpoints usage.
 
-## Is there programmatic access to the HF-Inference API?
+## Is there programmatic access to Inference Providers?
 
-Yes, the `huggingface_hub` library has a client wrapper documented [here](https://huggingface.co/docs/huggingface_hub/guides/inference).
+Yes! We provide client wrappers in both JS and Python:
+- [JS (`@huggingface/inference`)](https://huggingface.co/docs/huggingface.js/inference/classes/InferenceClient)
+- [Python (`huggingface_hub`)](https://huggingface.co/docs/huggingface_hub/guides/inference)
diff --git a/docs/hub/spaces-sdks-docker-langfuse.md b/docs/hub/spaces-sdks-docker-langfuse.md
@@ -77,9 +77,9 @@ Langfuse is model agnostic and can be used to trace any application. Follow the
 
 Langfuse maintains native integrations with many popular LLM frameworks, including [Langchain](https://langfuse.com/docs/integrations/langchain/tracing), [LlamaIndex](https://langfuse.com/docs/integrations/llama-index/get-started) and [OpenAI](https://langfuse.com/docs/integrations/openai/python/get-started) and offers Python and JS/TS SDKs to instrument your code. Langfuse also offers various API endpoints to ingest data and has been integrated by other open source projects such as [Langflow](https://langfuse.com/docs/integrations/langflow), [Dify](https://langfuse.com/docs/integrations/dify) and [Haystack](https://langfuse.com/docs/integrations/haystack/get-started).
 
-### Example 1: Trace Calls to HF Serverless API
+### Example 1: Trace Calls to Inference Providers
 
-As a simple example, here's how to trace LLM calls to the [HF Serverless API](https://huggingface.co/docs/inference-providers/en/index) using the Langfuse Python SDK.
+As a simple example, here's how to trace LLM calls to [Inference Providers](https://huggingface.co/docs/inference-providers/en/index) using the Langfuse Python SDK.
 
 Be sure to first configure your `LANGFUSE_HOST`, `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` environment variables, and make sure you've [authenticated with your Hugging Face account](https://huggingface.co/docs/huggingface_hub/en/quick-start#authentication).
 
@@ -88,7 +88,7 @@ from langfuse.openai import openai
 from huggingface_hub import get_token
 
 client = openai.OpenAI(
-    base_url="https://api-inference.huggingface.co/v1/",
+    base_url="https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.3-70B-Instruct/v1",
     api_key=get_token(),
 )
 
diff --git a/docs/inference-providers/index.md b/docs/inference-providers/index.md
@@ -5,7 +5,7 @@
     <img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/Inference-providers-banner-dark.png"/>
 </div>
 
-Hugging Face Inference Providers simplify and unify how developers access and run machine learning models by offering a unified, flexible interface to multiple serverless inference providers. This new approach extends our previous Serverless Inference API, providing more models, increased performances and better reliability thanks to our inference partners.
+Hugging Face’s Inference Providers give developers streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners. This new approach builds on our previous Serverless Inference API, offering more models, improved performance, and greater reliability thanks to world-class providers.
 
 To learn more about the launch of Inference Providers, check out our [announcement blog post](https://huggingface.co/blog/inference-providers).