huggingface · Wauplin · Apr 1, 2025 · Mar 26, 2025 · Mar 28, 2025 · Mar 28, 2025
diff --git a/docs/api-inference/_redirects.yml b/docs/api-inference/_redirects.yml
@@ -1,6 +1,6 @@
 quicktour: index
-detailed_parameters: parameters
-parallelism: getting_started
-usage: getting_started
+detailed_parameters: tasks/index
+parallelism: index
+usage: index
 faq: index
 rate-limits: pricing
diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml
@@ -1,27 +1,33 @@
-- sections:
+- title: Get Started
+  sections:
   - local: index
-    title: Serverless Inference API
-  - local: getting-started
-    title: Getting Started
-  - local: supported-models
-    title: Supported Models
+    title: Inference Providers
   - local: pricing
-    title: Pricing and Rate limits
+    title: Pricing and Billing
+  - local: hub-integration
+    title: Hub integration
   - local: security
     title: Security
-  title: Getting Started
-- sections:
-  - local: parameters
-    title: Parameters
-  - sections:
-    - local: tasks/audio-classification
-      title: Audio Classification
-    - local: tasks/automatic-speech-recognition
-      title: Automatic Speech Recognition
+- title: API Reference
+  sections:
+  - local: tasks/index
+    title: Index
+  - local: hub-api
+    title: Hub API
+  - title: Popular Tasks
+    sections:
     - local: tasks/chat-completion
       title: Chat Completion
     - local: tasks/feature-extraction
       title: Feature Extraction
+    - local: tasks/text-to-image
+      title: Text to Image
+  - title: Other Tasks
+    sections:
+    - local: tasks/audio-classification
+      title: Audio Classification
+    - local: tasks/automatic-speech-recognition
+      title: Automatic Speech Recognition
     - local: tasks/fill-mask
       title: Fill Mask
     - local: tasks/image-classification
@@ -30,8 +36,6 @@
       title: Image Segmentation
     - local: tasks/image-to-image
       title: Image to Image
-    - local: tasks/image-text-to-text
-      title: Image-Text to Text
     - local: tasks/object-detection
       title: Object Detection
     - local: tasks/question-answering
@@ -44,13 +48,9 @@
       title: Text Classification
     - local: tasks/text-generation
       title: Text Generation
-    - local: tasks/text-to-image
-      title: Text to Image
     - local: tasks/token-classification
       title: Token Classification
     - local: tasks/translation
       title: Translation
     - local: tasks/zero-shot-classification
-      title: Zero Shot Classification
-    title: Detailed Task Parameters
-  title: API Reference
+      title: Zero Shot Classification
diff --git a/docs/api-inference/getting-started.md b/docs/api-inference/getting-started.md
diff --git a/docs/api-inference/hub-api.md b/docs/api-inference/hub-api.md
@@ -0,0 +1,173 @@
+# Hub API
+
+The Hub provides a few API to deal with Inference Providers. Here is a list of them.
+
+## List models
+
+To list models powered by a provider, use the `inference_provider` query parameter:
+
+```sh
+# List all models served by Fireworks AI
+~ curl -s https://huggingface.co/api/models?inference_provider=fireworks-ai | jq ".[].id"
+"deepseek-ai/DeepSeek-V3-0324"
+"deepseek-ai/DeepSeek-R1"
+"Qwen/QwQ-32B"
+"deepseek-ai/DeepSeek-V3"
+...
+```
+
+It can be combined with other filters to e.g. select only text-to-image models:
+
+```sh
+# List text-to-image models served by Fal AI
+~ curl -s https://huggingface.co/api/models?inference_provider=fal-ai&pipeline_tag=text-to-image | jq ".[].id"
+"black-forest-labs/FLUX.1-dev"
+"stabilityai/stable-diffusion-3.5-large"
+"black-forest-labs/FLUX.1-schnell"
+"stabilityai/stable-diffusion-3.5-large-turbo"
+...
+```
+
+Pass a comma-separated list to select from multiple providers:
+
+```sh
+# List image-text-to-text models served by Novita or Sambanova
+~ curl -s https://huggingface.co/api/models?inference_provider=sambanova,novita&pipeline_tag=image-text-to-text | jq ".[].id"
+"meta-llama/Llama-3.2-11B-Vision-Instruct"
+"meta-llama/Llama-3.2-90B-Vision-Instruct"
+"Qwen/Qwen2-VL-72B-Instruct"
+```
+
+Finally, you can select all models served by at least one inference provider:
+
+```sh
+# List text-to-video models served by any provider
+~ curl -s https://huggingface.co/api/models?inference_provider=all&pipeline_tag=text-to-video | jq ".[].id"
+"Wan-AI/Wan2.1-T2V-14B"
+"Lightricks/LTX-Video"
+"tencent/HunyuanVideo"
+"Wan-AI/Wan2.1-T2V-1.3B"
+"THUDM/CogVideoX-5b"
+"genmo/mochi-1-preview"
+"BagOu22/Lora_HKLPAZ"
+```
+
+## Get model status
+
+If you are interested by a specific model and want to check if at least 1 provider serves it, you can request the `inference` attribute in the model info endpoint:
+
+<inferencesnippet>
+
+<curl>
+
+```sh
+# Get google/gemma-3-27b-it inference status (warm)
+~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inference
+{
+"_id": "67c35b9bb236f0d365bf29d3",
+"id": "google/gemma-3-27b-it",
+"inference": "warm"
+}
+```
+</curl>
+
+<python>
+
+In the `huggingface_hub`, use `model_info` with the expand parameter:
+
+```py
+>>> from huggingface_hub import model_info
+
+>>> info = model_info("google/gemma-3-27b-it", expand="inference")
+>>> info.inference
+'warm'
+```
+
+</python>
+
+</inferencesnippet>
+
+Inference status is either "warm" or undefined:
+
+<inferencesnippet>
+
+<curl>
+
+```sh
+# Get inference status (not warm)
+~ curl -s https://huggingface.co/api/models/manycore-research/SpatialLM-Llama-1B?expand[]=inference
+{
+"_id": "67d3b141d8b6e20c6d009c8b",
+"id": "manycore-research/SpatialLM-Llama-1B"
+}
+```
+
+</curl>
+
+<python>
+
+In the `huggingface_hub`, use `model_info` with the expand parameter:
+
+```py
+>>> from huggingface_hub import model_info
+
+>>> info = model_info("manycore-research/SpatialLM-Llama-1B", expand="inference")
+>>> info.inference_provider_mapping
+None
+```
+
+</python>
+
+</inferencesnippet>
+
+## Get model providers
+
+If you are interested by a specific model and want to check the list of providers serving it, you can request the `inferenceProviderMapping` attribute in the model info endpoint:
+
+<inferencesnippet>
+
+<curl>
+
+```sh
+# List google/gemma-3-27b-it providers
+~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inferenceProviderMapping
+{
+    "_id": "67c35b9bb236f0d365bf29d3",
+    "id": "google/gemma-3-27b-it",
+    "inferenceProviderMapping": {
+        "hf-inference": {
+            "status": "live",
+            "providerId": "google/gemma-3-27b-it",
+            "task": "conversational"
+        },
+        "nebius": {
+            "status": "live",
+            "providerId": "google/gemma-3-27b-it-fast",
+            "task": "conversational"
+        }
+    }
+}
+```
+</curl>
+
+<python>
+
+In the `huggingface_hub`, use `model_info` with the expand parameter:
+
+```py
+>>> from huggingface_hub import model_info
+
+>>> info = model_info("google/gemma-3-27b-it", expand="inferenceProviderMapping")
+>>> info.inference_provider_mapping
+{
+    'hf-inference': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it', task='conversational'),
+    'nebius': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it-fast', task='conversational'),
+}
+```
+
+</python>
+
+</inferencesnippet>
+
+
+For each provider, you get the status (`staging` or `live`), the related task (here, `conversational`) and the providerId. In practice, this information is mostly relevant for the JS and Python clients. The relevant part is to know that the listed providers are the ones serving the model.