-
Notifications
You must be signed in to change notification settings - Fork 374
Revamp Inference Providers doc #1652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 14 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
69b16f7
first draft
Wauplin 4e6ccd4
Hub Integration page
Wauplin b18eddd
rename Inference Provider API to Inference Providers
Wauplin f4d6689
Update docs/api-inference/pricing.md
Wauplin 51f6468
Update docs/api-inference/pricing.md
Wauplin 2b16f3e
Update docs/api-inference/security.md
Wauplin 74833f8
feedback
Wauplin be30296
Merge branch 'refacto-for-inference-providers-doc' of github.com:hugg…
Wauplin fa8efb6
move text generation to not popular
Wauplin 353a7e4
Update docs/api-inference/pricing.md
Wauplin bf55fb1
Hub API page
Wauplin 668df05
Met lrge branch 'refacto-for-inference-providers-doc' of github.com:h…
Wauplin 2e99c0d
python example where possible
Wauplin 4d7fd47
remove TODO page
Wauplin 95f0171
Apply suggestions from code review
Wauplin 61b3717
Apply suggestions from code review
Wauplin 63034eb
fix: screenshots display
SBrandeis 6a2660e
Update docs/api-inference/index.md
Wauplin 751ac5a
add titles to toc
Wauplin 63a795d
Update docs/api-inference/index.md
Wauplin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| quicktour: index | ||
| detailed_parameters: parameters | ||
| parallelism: getting_started | ||
| usage: getting_started | ||
| detailed_parameters: tasks/index | ||
| parallelism: index | ||
| usage: index | ||
| faq: index | ||
| rate-limits: pricing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,173 @@ | ||
| # Hub API | ||
|
|
||
| The Hub provides a few API to deal with Inference Providers. Here is a list of them. | ||
|
|
||
| ## List models | ||
|
|
||
| To list models powered by a provider, use the `inference_provider` query parameter: | ||
|
|
||
| ```sh | ||
| # List all models served by Fireworks AI | ||
| ~ curl -s https://huggingface.co/api/models?inference_provider=fireworks-ai | jq ".[].id" | ||
| "deepseek-ai/DeepSeek-V3-0324" | ||
| "deepseek-ai/DeepSeek-R1" | ||
| "Qwen/QwQ-32B" | ||
| "deepseek-ai/DeepSeek-V3" | ||
| ... | ||
| ``` | ||
|
|
||
| It can be combined with other filters to e.g. select only text-to-image models: | ||
Wauplin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```sh | ||
| # List text-to-image models served by Fal AI | ||
| ~ curl -s https://huggingface.co/api/models?inference_provider=fal-ai&pipeline_tag=text-to-image | jq ".[].id" | ||
| "black-forest-labs/FLUX.1-dev" | ||
| "stabilityai/stable-diffusion-3.5-large" | ||
| "black-forest-labs/FLUX.1-schnell" | ||
| "stabilityai/stable-diffusion-3.5-large-turbo" | ||
| ... | ||
| ``` | ||
|
|
||
| Pass a comma-separated list to select from multiple providers: | ||
Wauplin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```sh | ||
| # List image-text-to-text models served by Novita or Sambanova | ||
| ~ curl -s https://huggingface.co/api/models?inference_provider=sambanova,novita&pipeline_tag=image-text-to-text | jq ".[].id" | ||
| "meta-llama/Llama-3.2-11B-Vision-Instruct" | ||
| "meta-llama/Llama-3.2-90B-Vision-Instruct" | ||
| "Qwen/Qwen2-VL-72B-Instruct" | ||
| ``` | ||
|
|
||
| Finally, you can select all models served by at least one inference provider: | ||
|
|
||
| ```sh | ||
| # List text-to-video models served by any provider | ||
| ~ curl -s https://huggingface.co/api/models?inference_provider=all&pipeline_tag=text-to-video | jq ".[].id" | ||
| "Wan-AI/Wan2.1-T2V-14B" | ||
| "Lightricks/LTX-Video" | ||
| "tencent/HunyuanVideo" | ||
| "Wan-AI/Wan2.1-T2V-1.3B" | ||
| "THUDM/CogVideoX-5b" | ||
| "genmo/mochi-1-preview" | ||
| "BagOu22/Lora_HKLPAZ" | ||
| ``` | ||
|
|
||
| ## Get model status | ||
|
|
||
| If you are interested by a specific model and want to check if at least 1 provider serves it, you can request the `inference` attribute in the model info endpoint: | ||
Wauplin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| <inferencesnippet> | ||
|
|
||
| <curl> | ||
|
|
||
| ```sh | ||
| # Get google/gemma-3-27b-it inference status (warm) | ||
| ~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inference | ||
| { | ||
| "_id": "67c35b9bb236f0d365bf29d3", | ||
| "id": "google/gemma-3-27b-it", | ||
| "inference": "warm" | ||
| } | ||
| ``` | ||
| </curl> | ||
|
|
||
| <python> | ||
|
|
||
| In the `huggingface_hub`, use `model_info` with the expand parameter: | ||
|
|
||
| ```py | ||
| >>> from huggingface_hub import model_info | ||
|
|
||
| >>> info = model_info("google/gemma-3-27b-it", expand="inference") | ||
| >>> info.inference | ||
| 'warm' | ||
| ``` | ||
|
|
||
| </python> | ||
|
|
||
| </inferencesnippet> | ||
|
|
||
| Inference status is either "warm" or undefined: | ||
|
|
||
| <inferencesnippet> | ||
|
|
||
| <curl> | ||
|
|
||
| ```sh | ||
| # Get inference status (not warm) | ||
Wauplin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ~ curl -s https://huggingface.co/api/models/manycore-research/SpatialLM-Llama-1B?expand[]=inference | ||
| { | ||
| "_id": "67d3b141d8b6e20c6d009c8b", | ||
| "id": "manycore-research/SpatialLM-Llama-1B" | ||
| } | ||
| ``` | ||
|
|
||
| </curl> | ||
|
|
||
| <python> | ||
|
|
||
| In the `huggingface_hub`, use `model_info` with the expand parameter: | ||
|
|
||
| ```py | ||
| >>> from huggingface_hub import model_info | ||
|
|
||
| >>> info = model_info("manycore-research/SpatialLM-Llama-1B", expand="inference") | ||
| >>> info.inference_provider_mapping | ||
Wauplin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| None | ||
| ``` | ||
|
|
||
| </python> | ||
|
|
||
| </inferencesnippet> | ||
|
|
||
| ## Get model providers | ||
|
|
||
| If you are interested by a specific model and want to check the list of providers serving it, you can request the `inferenceProviderMapping` attribute in the model info endpoint: | ||
|
|
||
| <inferencesnippet> | ||
|
|
||
| <curl> | ||
|
|
||
| ```sh | ||
| # List google/gemma-3-27b-it providers | ||
| ~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inferenceProviderMapping | ||
| { | ||
| "_id": "67c35b9bb236f0d365bf29d3", | ||
| "id": "google/gemma-3-27b-it", | ||
| "inferenceProviderMapping": { | ||
| "hf-inference": { | ||
| "status": "live", | ||
| "providerId": "google/gemma-3-27b-it", | ||
| "task": "conversational" | ||
| }, | ||
| "nebius": { | ||
| "status": "live", | ||
| "providerId": "google/gemma-3-27b-it-fast", | ||
| "task": "conversational" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| </curl> | ||
|
|
||
| <python> | ||
|
|
||
| In the `huggingface_hub`, use `model_info` with the expand parameter: | ||
|
|
||
| ```py | ||
| >>> from huggingface_hub import model_info | ||
|
|
||
| >>> info = model_info("google/gemma-3-27b-it", expand="inferenceProviderMapping") | ||
| >>> info.inference_provider_mapping | ||
| { | ||
| 'hf-inference': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it', task='conversational'), | ||
| 'nebius': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it-fast', task='conversational'), | ||
| } | ||
| ``` | ||
|
|
||
| </python> | ||
|
|
||
| </inferencesnippet> | ||
|
|
||
|
|
||
| For each provider, you get the status (`staging` or `live`), the related task (here, `conversational`) and the providerId. In practice, this information is mostly relevant for the JS and Python clients. The relevant part is to know that the listed providers are the ones serving the model. | ||
Wauplin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.