Skip to content
2 changes: 1 addition & 1 deletion docs/hub/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@
- local: models-widgets-examples
title: Widget Examples
- local: models-inference
title: Inference API docs
title: Model Inference
- local: models-download-stats
title: Models Download Stats
- local: models-faq
Expand Down
144 changes: 128 additions & 16 deletions docs/hub/models-inference.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,142 @@
# Inference Providers

Please refer to the [Inference Providers Documentation](https://huggingface.co/docs/inference-providers) for detailed information.
Hugging Face's model pages have pay-as-you-go inference for thousands of models, so you can try them all out right in the browser. Service is powered by Inference Providers and includes a free-tier.

## What is HF-Inference API?
Inference Providers give developers streamlined, unified access to hundreds of machine learning models, powered by the best serverless inference partners. 👉 **For complete documentation, visit the [Inference Providers Documentation](https://huggingface.co/docs/inference-providers)**.

HF-Inference API is one of the many providers available on the Hugging Face Hub.
It is deployed by Hugging Face ourselves, using text-generation-inference for LLMs for instance. This service used to be called “Inference API (serverless)” prior to Inference Providers.
## Inference Providers on the Hub

For more details about the HF-Inference API, check out its [dedicated page](https://huggingface.co/docs/inference-providers/providers/hf-inference).
Inference Providers is deeply integrated with the Hugging Face Hub, and you can use it in a few different ways:

## What technology do you use to power the HF-Inference API?
- **Interactive Widgets** - Test models directly on model pages with interactive widgets that use Inference Providers under the hood. Check out the [DeepSeek-R1-0528 model page](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) for an example.
- **Inference Playground** - Easily test and compare chat completion models with your prompts. Check out the [Inference Playground](https://huggingface.co/playground) to get started.
- **Search** - Filter models by inference provider on the [models page](https://huggingface.co/models?inference_provider=all) to find models available through specific providers.
- **Data Studio** - Use AI to explore datasets on the Hub. Check out [Data Studio](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts/viewer?views%5B%5D=train) on your favorite dataset.

The HF-Inference API is powered by [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) under the hood.
## Build with Inference Providers

## Why don't I see an inference widget, or why can't I use the API?
You can integrate Inference Providers into your own applications using our SDKs or HTTP clients. Here's a quick start with Python and JavaScript, for more details, check out the [Inference Providers Documentation](https://huggingface.co/docs/inference-providers).

For some tasks, there might not be support by any Inference Provider, and hence, there is no widget.
<hfoptions id="inference-providers-quick-start">

## How can I see my usage?
<hfoption id="python">

To check usage across all providers, check out your [billing page](https://huggingface.co/settings/billing).
You can use our Python SDK to interact with Inference Providers.

To check your HF-Inference usage specifically, check out the [Inference Dashboard](https://ui.endpoints.huggingface.co/endpoints). The dashboard shows both your serverless and dedicated endpoints usage.
```python
from huggingface_hub import InferenceClient

## Is there programmatic access to Inference Providers?
import os

Yes! We provide client wrappers in both JS and Python:
- [JS (`@huggingface/inference`)](https://huggingface.co/docs/huggingface.js/inference/classes/InferenceClient)
- [Python (`huggingface_hub`)](https://huggingface.co/docs/huggingface_hub/guides/inference)
client = InferenceClient(
api_key=os.environ["HF_TOKEN"],
provider="auto", # Automatically selects best provider
)

# Chat completion
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=[{"role": "user", "content": "A story about hiking in the mountains"}]
)

# Image generation
image = client.text_to_image(
prompt="A serene lake surrounded by mountains at sunset, photorealistic style",
model="black-forest-labs/FLUX.1-dev"
)

```

Or, you can just use the OpenAI API compatible client.

```python
import os
from huggingface_hub import InferenceClient

client = InferenceClient(
api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=[
{
"role": "user",
"content": "A story about hiking in the mountains"
}
],
)
```

<Tip warning={true}>

The OpenAI API compatible client is not supported for image generation.

</Tip>

</hfoption>

<hfoption id="javascript">

You can use our JavaScript SDK to interact with Inference Providers.

```javascript
import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);

const chatCompletion = await client.chatCompletion({
provider: "auto", // Automatically selects best provider
model: "deepseek-ai/DeepSeek-V3-0324",
messages: [{ role: "user", content: "Hello!" }]
});

const imageBlob = await client.textToImage({
model: "black-forest-labs/FLUX.1-dev",
inputs:
"A serene lake surrounded by mountains at sunset, photorealistic style",
});
```

Or, you can just use the OpenAI API compatible client.

```javascript
import { OpenAI } from "openai";

const client = new OpenAI({
baseURL: "https://router.huggingface.co/v1",
apiKey: process.env.HF_TOKEN,
});

const completion = await client.chat.completions.create({
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [{ role: "user", content: "A story about hiking in the mountains" }],
});

```

<Tip warning={true}>

The OpenAI API compatible client is not supported for image generation.

</Tip>

</hfoption>

</hfoptions>

You'll need a Hugging Face token with inference permissions. Create one at [Settings > Tokens](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).

### How Inference Providers works

To dive deeper into Inference Providers, check out the [Inference Providers Documentation](https://huggingface.co/docs/inference-providers). Here are some key resources:

- **[Quick Start](https://huggingface.co/docs/inference-providers)**
- **[Pricing & Billing Guide](https://huggingface.co/docs/inference-providers/pricing)**
- **[Hub Integration Details](https://huggingface.co/docs/inference-providers/hub-integration)**

### What was the HF-Inference API?

HF-Inference API is one of the providers available through Inference Providers. It was previously called "Inference API (serverless)" and is powered by [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) under the hood.

For more details about the HF-Inference provider specifically, check out its [dedicated page](https://huggingface.co/docs/inference-providers/providers/hf-inference).