[Docs] Remove Inference API references in docs (huggingface#3197)

hanouticelina · mintyleaf · commit cc6522c8b473 · 2025-07-11T23:25:46.000+04:00
* remove inference api references

* better

* better wording
diff --git a/README.md b/README.md
@@ -146,7 +146,6 @@ The advantages are:
 
 - Free model or dataset hosting for libraries and their users.
 - Built-in file versioning, even with very large files, thanks to a git-based approach.
-- Serverless inference API for all models publicly available.
 - In-browser widgets to play with the uploaded models.
 - Anyone can upload a new model for your library, they just need to add the corresponding tag for the model to be discoverable.
 - Fast downloads! We use Cloudfront (a CDN) to geo-replicate downloads so they're blazing fast from anywhere on the globe.
diff --git a/docs/source/de/guides/integrations.md b/docs/source/de/guides/integrations.md
@@ -11,8 +11,7 @@ Es gibt vier Hauptwege, eine Bibliothek mit dem Hub zu integrieren:
    Dies beinhaltet das Modellgewicht sowie [die Modellkarte](https://huggingface.co/docs/huggingface_hub/how-to-model-cards) und alle anderen relevanten Informationen oder Daten, die für den Betrieb des Modells erforderlich sind (zum Beispiel Trainingsprotokolle). Diese Methode wird oft `push_to_hub()` genannt.
 2. **Download from Hub**: Implementieren Sie eine Methode, um ein Modell vom Hub zu laden.
    Die Methode sollte die Modellkonfiguration/-gewichte herunterladen und das Modell laden. Diese Methode wird oft `from_pretrained` oder `load_from_hub()` genannt.
-3. **Inference API**: Nutzen Sie unsere Server, um Inferenz auf von Ihrer Bibliothek unterstützten Modellen kostenlos auszuführen.
-4. **Widgets**: Zeigen Sie ein Widget auf der Landing Page Ihrer Modelle auf dem Hub an.
+3. **Widgets**: Zeigen Sie ein Widget auf der Landing Page Ihrer Modelle auf dem Hub an.
    Dies ermöglicht es Benutzern, ein Modell schnell aus dem Browser heraus auszuprobieren.
 
 In diesem Leitfaden konzentrieren wir uns auf die ersten beiden Themen. Wir werden die beiden Hauptansätze vorstellen, die Sie zur Integration einer Bibliothek verwenden können, mit ihren Vor- und Nachteilen. Am Ende des Leitfadens ist alles zusammengefasst, um Ihnen bei der Auswahl zwischen den beiden zu helfen. Bitte beachten Sie, dass dies nur Richtlinien sind, die Sie an Ihre Anforderungen anpassen können.
diff --git a/docs/source/en/guides/integrations.md b/docs/source/en/guides/integrations.md
@@ -15,8 +15,7 @@ There are four main ways to integrate a library with the Hub:
    or data necessary to run the model (for example, training logs). This method is often called `push_to_hub()`.
 2. **Download from Hub:** implement a method to load a model from the Hub. The method should download the model
    configuration/weights and load the model. This method is often called `from_pretrained` or `load_from_hub()`.
-3. **Inference API:** use our servers to run inference on models supported by your library for free.
-4. **Widgets:** display a widget on the landing page of your models on the Hub. It allows users to quickly try a model
+3. **Widgets:** display a widget on the landing page of your models on the Hub. It allows users to quickly try a model
    from the browser.
 
 In this guide, we will focus on the first two topics. We will present the two main approaches you can use to integrate
diff --git a/docs/source/en/guides/overview.md b/docs/source/en/guides/overview.md
@@ -60,7 +60,7 @@ Take a look at these guides to learn how to use huggingface_hub to solve real-wo
       <div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
         Inference
       </div><p class="text-gray-700">
-        How to make predictions using the HF Inference API and other Inference Providers?
+        How to make predictions using Hugging Face Inference Providers?
       </p>
     </a>
 
diff --git a/docs/source/en/index.md b/docs/source/en/index.md
@@ -14,8 +14,7 @@ do all these things with Python.
 Read the [quick start guide](quick-start) to get up and running with the
 `huggingface_hub` library. You will learn how to download files from the Hub, create a
 repository, and upload files to the Hub. Keep reading to learn more about how to manage
-your repositories on the 🤗 Hub, how to interact in discussions or even how to access
-the Inference API.
+your repositories on the 🤗 Hub, how to interact in discussions or even how to run inference.
 
 <div class="mt-10">
   <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
diff --git a/docs/source/en/package_reference/inference_client.md b/docs/source/en/package_reference/inference_client.md
@@ -4,11 +4,12 @@ rendered properly in your Markdown viewer.
 
 # Inference
 
-Inference is the process of using a trained model to make predictions on new data. Because this process can be compute-intensive, running on a dedicated or external service can be an interesting option.  
-The `huggingface_hub`  library provides a unified interface to run inference across multiple services for models hosted on the Hugging Face Hub:
-1.  [Inference API](https://huggingface.co/docs/api-inference/index): a serverless solution that allows you to run accelerated inference on Hugging Face's infrastructure for free. This service is a fast way to get started, test different models, and prototype AI products.
-2. Third-party providers: various serverless solution provided by external providers (Together, Sambanova, etc.). These providers offer production-ready APIs on a pay-a-you-go model. This is the fastest way to integrate AI in your products with a maintenance-free and scalable solution. Refer to the [Supported providers and tasks](../guides/inference#supported-providers-and-tasks) section for a list of supported providers.      
-3. [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index): a product to easily deploy models to production. Inference is run by Hugging Face in a dedicated, fully managed infrastructure on a cloud provider of your choice.
+Inference is the process of using a trained model to make predictions on new data. Because this process can be compute-intensive, running on a dedicated or external service can be an interesting option.
+The `huggingface_hub` library provides a unified interface to run inference across multiple services for models hosted on the Hugging Face Hub:
+
+1.  [Inference Providers](https://huggingface.co/docs/inference-providers/index): a streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners. This new approach builds on our previous Serverless Inference API, offering more models, improved performance, and greater reliability thanks to world-class providers. Refer to the [documentation](https://huggingface.co/docs/inference-providers/index#partners) for a list of supported providers.
+2.  [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index): a product to easily deploy models to production. Inference is run by Hugging Face in a dedicated, fully managed infrastructure on a cloud provider of your choice.
+3.  Local endpoints: you can also run inference with local inference servers like [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://ollama.com/), [vLLM](https://github.com/vllm-project/vllm), [LiteLLM](https://docs.litellm.ai/docs/simple_proxy), or [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) by connecting the client to these local endpoints.
 
 These services can be called with the [`InferenceClient`] object. Please refer to [this guide](../guides/inference)
 for more information on how to use it.
diff --git a/docs/source/en/quick-start.md b/docs/source/en/quick-start.md
@@ -197,4 +197,4 @@ Hub, we recommend reading our [how-to guides](./guides/overview) to:
 - [Download](./guides/download) files from the Hub.
 - [Upload](./guides/upload) files to the Hub.
 - [Search the Hub](./guides/search) for your desired model or dataset.
-- [Access the Inference API](./guides/inference) for fast inference.
+- [Run Inference](./guides/inference) across multiple services for models hosted on the Hugging Face Hub.
diff --git a/docs/source/fr/guides/integrations.md b/docs/source/fr/guides/integrations.md
@@ -10,12 +10,11 @@ Des [dizaines de librairies](https://huggingface.co/docs/hub/models-libraries) s
 Il existe quatre façons principales d'intégrer une bibliothèque au Hub :
 1. **Push to Hub**  implémente une méthode pour upload un modèle sur le Hub. Cela inclut les paramètres du modèle, sa fiche descriptive (appelée [Model Card](https://huggingface.co/docs/huggingface_hub/how-to-model-cards)) et toute autre information pertinente liée au modèle (par exemple, les logs d'entraînement). Cette méthode est souvent appelée `push_to_hub()`.
 2. **Download from Hub** implémente une méthode pour charger un modèle depuis le Hub. La méthode doit télécharger la configuration et les poids du modèle puis instancier celui-ci. Cette méthode est souvent appelée `from_pretrained` ou `load_from_hub()`.
-3. **Inference API** utilise nos serveurs pour faire de l'inférence gratuitement sur des modèles supportés par votre librairie.
-4. **Widgets** affiche un widget sur la page d'accueil de votre modèle dans le Hub. Les widgets permettent aux utilisateurs de rapidement tester un modèle depuis le navigateur.
+3. **Widgets** affiche un widget sur la page d'accueil de votre modèle dans le Hub. Les widgets permettent aux utilisateurs de rapidement tester un modèle depuis le navigateur.
 
 Dans ce guide, nous nous concentrerons sur les deux premiers sujets. Nous présenterons les deux approches principales que vous pouvez utiliser pour intégrer une librairie, avec leurs avantages et leurs inconvénients. Tout est résumé à la fin du guide pour vous aider à choisir entre les deux. Veuillez garder à l'esprit que ce ne sont que des conseils, et vous êtes libres de les adapter à votre cas d'usage.
 
-Si l'Inference API et les Widgets vous intéressent, vous pouvez suivre [ce guide](https://huggingface.co/docs/hub/models-adding-libraries#set-up-the-inference-api). Dans les deux cas, vous pouvez nous contacter si vous intégrez une librairie au Hub et que vous voulez être listé [dans la documentation officielle](https://huggingface.co/docs/hub/models-libraries).
+Si les Widgets vous intéressent, vous pouvez suivre [ce guide](https://huggingface.co/docs/hub/models-adding-libraries#set-up-the-inference-api). Dans les deux cas, vous pouvez nous contacter si vous intégrez une librairie au Hub et que vous voulez être listé [dans la documentation officielle](https://huggingface.co/docs/hub/models-libraries).
 
 ## Une approche flexible: les helpers