You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/inference-providers/index.md
+46-48Lines changed: 46 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ To learn more about the launch of Inference Providers, check out our [announceme
11
11
12
12
## Partners
13
13
14
-
Here is the complete list of partners integrated with Inference Providers, and the supported tasks for each of them:
14
+
Our platform integrates with leading AI infrastructure providers, giving you access to their specialized capabilities through a single, consistent API. Here's what each partner supports:
15
15
16
16
| Provider | Chat completion (LLM) | Chat completion (VLM) | Feature Extraction | Text to Image | Text to video |
@@ -30,16 +30,24 @@ Here is the complete list of partners integrated with Inference Providers, and t
30
30
|[SambaNova](./providers/sambanova)| ✅ || ✅ |||
31
31
|[Together](./providers/together)| ✅ | ✅ || ✅ ||
32
32
33
-
## Why use Inference Providers?
33
+
## Why Choose Inference Providers?
34
34
35
-
Inference Providers offers a fast and simple way to explore thousands of models for a variety of tasks. Whether you're experimenting with ML capabilities or building a new application, this API gives you instant access to high-performing models across multiple domains:
35
+
If you're building AI-powered applications, you've likely experienced the pain points of managing multiple provider APIs, comparing model performance, and dealing with varying reliability. Inference Providers solves these challenges by offering:
36
36
37
-
-**Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
38
-
-**Image and Video Generation:** Easily create customized images, including LoRAs for your own styles.
39
-
-**Document Embeddings:** Build search and retrieval systems with SOTA embeddings.
40
-
-**Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more.
37
+
**Instant Access to Cutting-Edge Models**: Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks. Whether you need the latest language models, state-of-the-art image generators, or domain-specific embeddings, you'll find them here.
41
38
42
-
⚡ **Fast and Free to Get Started**: Inference Providers comes with a free-tier and additional included credits for [PRO users](https://hf.co/subscribe/pro), as well as [Enterprise Hub organizations](https://huggingface.co/enterprise).
39
+
**Zero Vendor Lock-in**: Unlike being tied to a single provider's model catalog, you get access to models from Cerebras, Groq, Together AI, Replicate, and more — all through one consistent interface.
40
+
41
+
**Production-Ready Performance**: Built for enterprise workloads with automatic failover, intelligent routing, and the reliability your applications demand.
42
+
43
+
Here's what you can build:
44
+
45
+
-**Text Generation**: Use Large language models with tool-calling capabilities for chatbots, content generation, and code assistance
46
+
-**Image and Video Generation**: Create custom images and videos, including support for LoRAs and style customization
47
+
-**Search & Retrieval**: State-of-the-art embeddings for semantic search, RAG systems, and recommendation engines
48
+
-**Traditional ML Tasks**: Ready-to-use models for classification, NER, summarization, and speech recognition
49
+
50
+
⚡ **Get Started for Free**: Inference Providers includes a generous free tier, with additional credits for [PRO users](https://hf.co/subscribe/pro) and [Enterprise Hub organizations](https://huggingface.co/enterprise).
43
51
44
52
## Key Features
45
53
@@ -50,40 +58,39 @@ Inference Providers offers a fast and simple way to explore thousands of models
50
58
-**👷 Easy to integrate**: Drop-in replacement for the OpenAI chat completions API.
51
59
-**💰 Cost-Effective**: No extra markup on provider rates.
52
60
53
-
## Get Started
61
+
## Getting Started
54
62
55
-
You can use Inference Providers with your preferred tools, such as Python, JavaScript, or cURL. To simplify integration, we offer both a Python SDK (`huggingface_hub`) and a JavaScript SDK (`huggingface.js`).
63
+
Inference Providers works with your existing development workflow. Whether you prefer Python, JavaScript, or direct HTTP calls, we provide native SDKs and OpenAI-compatible APIs to get you up and running quickly.
56
64
57
-
In this section, we will demonstrate a simple example using [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), a conversational Large Language Model.
65
+
We'll walk through a practical example using [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), a state-of-the-art open-weights conversational model.
58
66
59
67
### Inference Playground
60
68
61
-
To get started quickly with [Chat Completion models](http://huggingface.co/models?inference_provider=all&sort=trending&other=conversational), use the [Inference Playground](https://huggingface.co/playground) to easily test and compare models with your prompts.
69
+
Before diving into integration, explore models interactively with our [Inference Playground](https://huggingface.co/playground). Test different [chat completion models](http://huggingface.co/models?inference_provider=all&sort=trending&other=conversational) with your prompts and compare responses to find the perfect fit for your use case.
Inference Providers requires passing a user token in the request headers. You can generate a token by signing up on the Hugging Face website and going to the [settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). We recommend creating a `fine-grained` token with the scope to `Make calls to Inference Providers`.
75
+
You'll need a Hugging Face token to authenticate your requests. Create one by visiting your [token settings](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained) and generating a `fine-grained` token with `Make calls to Inference Providers` permissions.
68
76
69
-
For more details about user tokens, check out [this guide](https://huggingface.co/docs/hub/en/security-tokens).
77
+
For complete token management details, see our [security tokens guide](https://huggingface.co/docs/hub/en/security-tokens).
70
78
71
79
### Quick Start - LLM
72
80
73
81
TODO : add blurb explaining what we're doing here (quick inference with LLM and chat completions)
74
82
75
83
#### Python
76
84
77
-
This section explains how to use the Inference Providers API to run inference requests with [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) in Python.
85
+
Here are three ways to integrate Inference Providers into your Python applications, from high-level convenience to low-level control:
78
86
79
87
<hfoptionsid="python-clients">
80
88
81
89
<hfoptionid="huggingface_hub">
82
90
83
-
For convenience, the Python library `huggingface_hub` provides an [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/guides/inference) that handles inference for you.
84
-
The most suitable provider is automatically selected by the client library.
91
+
For convenience, the `huggingface_hub` library provides an [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/guides/inference) that automatically handles provider selection and request routing.
85
92
86
-
Make sure to install it with `pip install huggingface_hub`.
The Inference Providers API can be used as a drop-in replacement for the OpenAI API (or any chat completions compatible API) for your preferred client.
114
-
Just replace the chat completion base URL with `https://router.huggingface.co/v1`.
115
-
The most suitable provider for the model is automatically selected by the hugging face server.
116
-
For example, with the OpenAI Python client:
120
+
**Drop-in OpenAI Replacement**: Already using OpenAI's Python client? Just change the base URL to instantly access hundreds of additional open-weights models through our provider network.
121
+
122
+
Our system automatically routes your request to the optimal provider for the specified model:
This section explains how to use the Inference Providers API to run inference requests with [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) in Javascript.
180
+
Integrate Inference Providers into your JavaScript applications with these flexible approaches:
175
181
176
182
<hfoptionsid="javascript-clients">
177
183
178
184
<hfoptionid="huggingface.js">
179
185
180
-
For convenience, the JS library `@huggingface/inference` provides an [`InferenceClient`](https://huggingface.co/docs/huggingface.js/inference/classes/InferenceClient) that handles inference for you.
181
-
The most suitable provider is automatically selected by the client library.
186
+
Our JavaScript SDK provides a convenient interface with automatic provider selection and TypeScript support.
182
187
183
-
You can install it with `npm install @huggingface/inference`.
188
+
Install with `npm install @huggingface/inference`:
The Inference Providers API can be used as a drop-in replacement for the OpenAI API (or any chat completions compatible API) for your preferred client.
208
-
Just replace the chat completion base URL with `https://router.huggingface.co/v1`.
209
-
The most suitable provider for the model is automatically selected by the hugging face server.
210
-
For example, with the OpenAI JS client:
212
+
**OpenAI JavaScript Client Compatible**: Migrate your existing OpenAI integration seamlessly by updating just the base URL:
The following cURL command highlighting the raw HTTP request. You can adapt this request to be run with the tool of your choice.
273
-
274
-
The most suitable provider for the requested model will be automatically selected by the server.
272
+
For testing, debugging, or integrating with any HTTP client, here's the raw REST API format. Our intelligent routing automatically selects the optimal provider for your requested model:
0 commit comments