Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/api-inference/_redirects.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
quicktour: index
detailed_parameters: parameters
parallelism: getting_started
usage: getting_started
detailed_parameters: tasks/index
parallelism: index
usage: index
faq: index
rate-limits: pricing
46 changes: 23 additions & 23 deletions docs/api-inference/_toctree.yml
Original file line number Diff line number Diff line change
@@ -1,27 +1,33 @@
- sections:
- title: Get Started
sections:
- local: index
title: Serverless Inference API
- local: getting-started
title: Getting Started
- local: supported-models
title: Supported Models
title: Inference Providers
- local: pricing
title: Pricing and Rate limits
title: Pricing and Billing
- local: hub-integration
title: Hub integration
- local: security
title: Security
title: Getting Started
- sections:
- local: parameters
title: Parameters
- sections:
- local: tasks/audio-classification
title: Audio Classification
- local: tasks/automatic-speech-recognition
title: Automatic Speech Recognition
- title: API Reference
sections:
- local: tasks/index
title: Index
- local: hub-api
title: Hub API
- title: Popular Tasks
sections:
- local: tasks/chat-completion
title: Chat Completion
- local: tasks/feature-extraction
title: Feature Extraction
- local: tasks/text-to-image
title: Text to Image
- title: Other Tasks
sections:
- local: tasks/audio-classification
title: Audio Classification
- local: tasks/automatic-speech-recognition
title: Automatic Speech Recognition
- local: tasks/fill-mask
title: Fill Mask
- local: tasks/image-classification
Expand All @@ -30,8 +36,6 @@
title: Image Segmentation
- local: tasks/image-to-image
title: Image to Image
- local: tasks/image-text-to-text
title: Image-Text to Text
- local: tasks/object-detection
title: Object Detection
- local: tasks/question-answering
Expand All @@ -44,13 +48,9 @@
title: Text Classification
- local: tasks/text-generation
title: Text Generation
- local: tasks/text-to-image
title: Text to Image
- local: tasks/token-classification
title: Token Classification
- local: tasks/translation
title: Translation
- local: tasks/zero-shot-classification
title: Zero Shot Classification
title: Detailed Task Parameters
title: API Reference
title: Zero Shot Classification
95 changes: 0 additions & 95 deletions docs/api-inference/getting-started.md

This file was deleted.

173 changes: 173 additions & 0 deletions docs/api-inference/hub-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Hub API

The Hub provides a few API to deal with Inference Providers. Here is a list of them.

## List models

To list models powered by a provider, use the `inference_provider` query parameter:

```sh
# List all models served by Fireworks AI
~ curl -s https://huggingface.co/api/models?inference_provider=fireworks-ai | jq ".[].id"
"deepseek-ai/DeepSeek-V3-0324"
"deepseek-ai/DeepSeek-R1"
"Qwen/QwQ-32B"
"deepseek-ai/DeepSeek-V3"
...
```

It can be combined with other filters to e.g. select only text-to-image models:

```sh
# List text-to-image models served by Fal AI
~ curl -s https://huggingface.co/api/models?inference_provider=fal-ai&pipeline_tag=text-to-image | jq ".[].id"
"black-forest-labs/FLUX.1-dev"
"stabilityai/stable-diffusion-3.5-large"
"black-forest-labs/FLUX.1-schnell"
"stabilityai/stable-diffusion-3.5-large-turbo"
...
```

Pass a comma-separated list to select from multiple providers:

```sh
# List image-text-to-text models served by Novita or Sambanova
~ curl -s https://huggingface.co/api/models?inference_provider=sambanova,novita&pipeline_tag=image-text-to-text | jq ".[].id"
"meta-llama/Llama-3.2-11B-Vision-Instruct"
"meta-llama/Llama-3.2-90B-Vision-Instruct"
"Qwen/Qwen2-VL-72B-Instruct"
```

Finally, you can select all models served by at least one inference provider:

```sh
# List text-to-video models served by any provider
~ curl -s https://huggingface.co/api/models?inference_provider=all&pipeline_tag=text-to-video | jq ".[].id"
"Wan-AI/Wan2.1-T2V-14B"
"Lightricks/LTX-Video"
"tencent/HunyuanVideo"
"Wan-AI/Wan2.1-T2V-1.3B"
"THUDM/CogVideoX-5b"
"genmo/mochi-1-preview"
"BagOu22/Lora_HKLPAZ"
```

## Get model status

If you are interested by a specific model and want to check if at least 1 provider serves it, you can request the `inference` attribute in the model info endpoint:

<inferencesnippet>

<curl>

```sh
# Get google/gemma-3-27b-it inference status (warm)
~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inference
{
"_id": "67c35b9bb236f0d365bf29d3",
"id": "google/gemma-3-27b-it",
"inference": "warm"
}
```
</curl>

<python>

In the `huggingface_hub`, use `model_info` with the expand parameter:

```py
>>> from huggingface_hub import model_info

>>> info = model_info("google/gemma-3-27b-it", expand="inference")
>>> info.inference
'warm'
```

</python>

</inferencesnippet>

Inference status is either "warm" or undefined:

<inferencesnippet>

<curl>

```sh
# Get inference status (not warm)
~ curl -s https://huggingface.co/api/models/manycore-research/SpatialLM-Llama-1B?expand[]=inference
{
"_id": "67d3b141d8b6e20c6d009c8b",
"id": "manycore-research/SpatialLM-Llama-1B"
}
```

</curl>

<python>

In the `huggingface_hub`, use `model_info` with the expand parameter:

```py
>>> from huggingface_hub import model_info

>>> info = model_info("manycore-research/SpatialLM-Llama-1B", expand="inference")
>>> info.inference_provider_mapping
None
```

</python>

</inferencesnippet>

## Get model providers

If you are interested by a specific model and want to check the list of providers serving it, you can request the `inferenceProviderMapping` attribute in the model info endpoint:

<inferencesnippet>

<curl>

```sh
# List google/gemma-3-27b-it providers
~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inferenceProviderMapping
{
"_id": "67c35b9bb236f0d365bf29d3",
"id": "google/gemma-3-27b-it",
"inferenceProviderMapping": {
"hf-inference": {
"status": "live",
"providerId": "google/gemma-3-27b-it",
"task": "conversational"
},
"nebius": {
"status": "live",
"providerId": "google/gemma-3-27b-it-fast",
"task": "conversational"
}
}
}
```
</curl>

<python>

In the `huggingface_hub`, use `model_info` with the expand parameter:

```py
>>> from huggingface_hub import model_info

>>> info = model_info("google/gemma-3-27b-it", expand="inferenceProviderMapping")
>>> info.inference_provider_mapping
{
'hf-inference': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it', task='conversational'),
'nebius': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it-fast', task='conversational'),
}
```

</python>

</inferencesnippet>


For each provider, you get the status (`staging` or `live`), the related task (here, `conversational`) and the providerId. In practice, this information is mostly relevant for the JS and Python clients. The relevant part is to know that the listed providers are the ones serving the model.
Loading