Skip to content

Commit d290556

Browse files
Wauplinjulien-cburtenshawPierrciSBrandeis
authored
Revamp Inference Providers doc (#1652)
* first draft * Hub Integration page * rename Inference Provider API to Inference Providers * Update docs/api-inference/pricing.md Co-authored-by: Julien Chaumond <[email protected]> * Update docs/api-inference/pricing.md Co-authored-by: Julien Chaumond <[email protected]> * Update docs/api-inference/security.md Co-authored-by: Julien Chaumond <[email protected]> * feedback * move text generation to not popular * Update docs/api-inference/pricing.md Co-authored-by: Julien Chaumond <[email protected]> * Hub API page * python example where possible * remove TODO page * Apply suggestions from code review Co-authored-by: burtenshaw <[email protected]> * Apply suggestions from code review Co-authored-by: Pierric Cistac <[email protected]> * fix: screenshots display * Update docs/api-inference/index.md Co-authored-by: Victor Muštar <[email protected]> * add titles to toc * Update docs/api-inference/index.md Co-authored-by: Victor Muštar <[email protected]> --------- Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: burtenshaw <[email protected]> Co-authored-by: Pierric Cistac <[email protected]> Co-authored-by: SBrandeis <[email protected]> Co-authored-by: Victor Muštar <[email protected]>
1 parent 2254001 commit d290556

File tree

12 files changed

+672
-490
lines changed

12 files changed

+672
-490
lines changed

docs/api-inference/_redirects.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
quicktour: index
2-
detailed_parameters: parameters
3-
parallelism: getting_started
4-
usage: getting_started
2+
detailed_parameters: tasks/index
3+
parallelism: index
4+
usage: index
55
faq: index
66
rate-limits: pricing

docs/api-inference/_toctree.yml

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,33 @@
1-
- sections:
1+
- title: Get Started
2+
sections:
23
- local: index
3-
title: Serverless Inference API
4-
- local: getting-started
5-
title: Getting Started
6-
- local: supported-models
7-
title: Supported Models
4+
title: Inference Providers
85
- local: pricing
9-
title: Pricing and Rate limits
6+
title: Pricing and Billing
7+
- local: hub-integration
8+
title: Hub integration
109
- local: security
1110
title: Security
12-
title: Getting Started
13-
- sections:
14-
- local: parameters
15-
title: Parameters
16-
- sections:
17-
- local: tasks/audio-classification
18-
title: Audio Classification
19-
- local: tasks/automatic-speech-recognition
20-
title: Automatic Speech Recognition
11+
- title: API Reference
12+
sections:
13+
- local: tasks/index
14+
title: Index
15+
- local: hub-api
16+
title: Hub API
17+
- title: Popular Tasks
18+
sections:
2119
- local: tasks/chat-completion
2220
title: Chat Completion
2321
- local: tasks/feature-extraction
2422
title: Feature Extraction
23+
- local: tasks/text-to-image
24+
title: Text to Image
25+
- title: Other Tasks
26+
sections:
27+
- local: tasks/audio-classification
28+
title: Audio Classification
29+
- local: tasks/automatic-speech-recognition
30+
title: Automatic Speech Recognition
2531
- local: tasks/fill-mask
2632
title: Fill Mask
2733
- local: tasks/image-classification
@@ -30,8 +36,6 @@
3036
title: Image Segmentation
3137
- local: tasks/image-to-image
3238
title: Image to Image
33-
- local: tasks/image-text-to-text
34-
title: Image-Text to Text
3539
- local: tasks/object-detection
3640
title: Object Detection
3741
- local: tasks/question-answering
@@ -44,13 +48,9 @@
4448
title: Text Classification
4549
- local: tasks/text-generation
4650
title: Text Generation
47-
- local: tasks/text-to-image
48-
title: Text to Image
4951
- local: tasks/token-classification
5052
title: Token Classification
5153
- local: tasks/translation
5254
title: Translation
5355
- local: tasks/zero-shot-classification
54-
title: Zero Shot Classification
55-
title: Detailed Task Parameters
56-
title: API Reference
56+
title: Zero Shot Classification

docs/api-inference/getting-started.md

Lines changed: 0 additions & 95 deletions
This file was deleted.

docs/api-inference/hub-api.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Hub API
2+
3+
The Hub provides a few APIs to interact with Inference Providers. Here is a list of them:
4+
5+
## List models
6+
7+
To list models powered by a provider, use the `inference_provider` query parameter:
8+
9+
```sh
10+
# List all models served by Fireworks AI
11+
~ curl -s https://huggingface.co/api/models?inference_provider=fireworks-ai | jq ".[].id"
12+
"deepseek-ai/DeepSeek-V3-0324"
13+
"deepseek-ai/DeepSeek-R1"
14+
"Qwen/QwQ-32B"
15+
"deepseek-ai/DeepSeek-V3"
16+
...
17+
```
18+
19+
It can be combined with other filters to e.g. select only `text-to-image` models:
20+
21+
```sh
22+
# List text-to-image models served by Fal AI
23+
~ curl -s https://huggingface.co/api/models?inference_provider=fal-ai&pipeline_tag=text-to-image | jq ".[].id"
24+
"black-forest-labs/FLUX.1-dev"
25+
"stabilityai/stable-diffusion-3.5-large"
26+
"black-forest-labs/FLUX.1-schnell"
27+
"stabilityai/stable-diffusion-3.5-large-turbo"
28+
...
29+
```
30+
31+
Pass a comma-separated list of providers to select multiple:
32+
33+
```sh
34+
# List image-text-to-text models served by Novita or Sambanova
35+
~ curl -s https://huggingface.co/api/models?inference_provider=sambanova,novita&pipeline_tag=image-text-to-text | jq ".[].id"
36+
"meta-llama/Llama-3.2-11B-Vision-Instruct"
37+
"meta-llama/Llama-3.2-90B-Vision-Instruct"
38+
"Qwen/Qwen2-VL-72B-Instruct"
39+
```
40+
41+
Finally, you can select all models served by at least one inference provider:
42+
43+
```sh
44+
# List text-to-video models served by any provider
45+
~ curl -s https://huggingface.co/api/models?inference_provider=all&pipeline_tag=text-to-video | jq ".[].id"
46+
"Wan-AI/Wan2.1-T2V-14B"
47+
"Lightricks/LTX-Video"
48+
"tencent/HunyuanVideo"
49+
"Wan-AI/Wan2.1-T2V-1.3B"
50+
"THUDM/CogVideoX-5b"
51+
"genmo/mochi-1-preview"
52+
"BagOu22/Lora_HKLPAZ"
53+
```
54+
55+
## Get model status
56+
57+
To find an inference provider for a specific model, request the `inference` attribute in the model info endpoint:
58+
59+
<inferencesnippet>
60+
61+
<curl>
62+
63+
```sh
64+
# Get google/gemma-3-27b-it inference status (warm)
65+
~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inference
66+
{
67+
"_id": "67c35b9bb236f0d365bf29d3",
68+
"id": "google/gemma-3-27b-it",
69+
"inference": "warm"
70+
}
71+
```
72+
</curl>
73+
74+
<python>
75+
76+
In the `huggingface_hub`, use `model_info` with the expand parameter:
77+
78+
```py
79+
>>> from huggingface_hub import model_info
80+
81+
>>> info = model_info("google/gemma-3-27b-it", expand="inference")
82+
>>> info.inference
83+
'warm'
84+
```
85+
86+
</python>
87+
88+
</inferencesnippet>
89+
90+
Inference status is either "warm" or undefined:
91+
92+
<inferencesnippet>
93+
94+
<curl>
95+
96+
```sh
97+
# Get inference status (no inference)
98+
~ curl -s https://huggingface.co/api/models/manycore-research/SpatialLM-Llama-1B?expand[]=inference
99+
{
100+
"_id": "67d3b141d8b6e20c6d009c8b",
101+
"id": "manycore-research/SpatialLM-Llama-1B"
102+
}
103+
```
104+
105+
</curl>
106+
107+
<python>
108+
109+
In the `huggingface_hub`, use `model_info` with the expand parameter:
110+
111+
```py
112+
>>> from huggingface_hub import model_info
113+
114+
>>> info = model_info("manycore-research/SpatialLM-Llama-1B", expand="inference")
115+
>>> info.inference
116+
None
117+
```
118+
119+
</python>
120+
121+
</inferencesnippet>
122+
123+
## Get model providers
124+
125+
If you are interested by a specific model and want to check the list of providers serving it, you can request the `inferenceProviderMapping` attribute in the model info endpoint:
126+
127+
<inferencesnippet>
128+
129+
<curl>
130+
131+
```sh
132+
# List google/gemma-3-27b-it providers
133+
~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inferenceProviderMapping
134+
{
135+
"_id": "67c35b9bb236f0d365bf29d3",
136+
"id": "google/gemma-3-27b-it",
137+
"inferenceProviderMapping": {
138+
"hf-inference": {
139+
"status": "live",
140+
"providerId": "google/gemma-3-27b-it",
141+
"task": "conversational"
142+
},
143+
"nebius": {
144+
"status": "live",
145+
"providerId": "google/gemma-3-27b-it-fast",
146+
"task": "conversational"
147+
}
148+
}
149+
}
150+
```
151+
</curl>
152+
153+
<python>
154+
155+
In the `huggingface_hub`, use `model_info` with the expand parameter:
156+
157+
```py
158+
>>> from huggingface_hub import model_info
159+
160+
>>> info = model_info("google/gemma-3-27b-it", expand="inferenceProviderMapping")
161+
>>> info.inference_provider_mapping
162+
{
163+
'hf-inference': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it', task='conversational'),
164+
'nebius': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it-fast', task='conversational'),
165+
}
166+
```
167+
168+
</python>
169+
170+
</inferencesnippet>
171+
172+
173+
Each provider serving the model shows a status (`staging` or `live`), the related task (here, `conversational`) and the providerId. In practice, this information is relevant for the JS and Python clients.

0 commit comments

Comments
 (0)