Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 6 additions & 15 deletions docs/inference-providers/guides/building-first-app.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,8 @@ export HF_TOKEN="your_token_here"
const HF_TOKEN = process.env.HF_TOKEN;
```

<Tip warning={true}>

When we deploy our app to Hugging Face Spaces, we'll need to add our token as a secret. This is a secure way to handle the token and avoid exposing it in the code.

</Tip>
> [!WARNING]
> When we deploy our app to Hugging Face Spaces, we'll need to add our token as a secret. This is a secure way to handle the token and avoid exposing it in the code.

</hfoption>
</hfoptions>
Expand Down Expand Up @@ -179,11 +176,8 @@ We'll also need to implement the `transcribe` and `summarize` functions.

Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing.

<Tip>

We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.

</Tip>
> [!TIP]
> We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.

```python
def transcribe_audio(audio_file_path):
Expand All @@ -205,11 +199,8 @@ def transcribe_audio(audio_file_path):

Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing.

<Tip>

We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.

</Tip>
> [!TIP]
> We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.

```javascript
import { InferenceClient } from 'https://esm.sh/@huggingface/inference';
Expand Down
21 changes: 6 additions & 15 deletions docs/inference-providers/guides/first-api-call.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,8 @@ Many developers avoid using open source AI models because they assume deployment

We're going to use the [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) model, which is a powerful text-to-image model.

<Tip>

This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).

</Tip>
> [!TIP]
> This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).

## Step 1: Find a Model on the Hub

Expand All @@ -39,11 +36,8 @@ Here, you can test the model directly in the browser from any of the available p

This widget uses the same endpoint you're about to implement in code.

<Tip warning={true}>

You'll need a Hugging Face account (free at [huggingface.co](https://huggingface.co)) and remaining credits to use the model.

</Tip>
> [!WARNING]
> You'll need a Hugging Face account (free at [huggingface.co](https://huggingface.co)) and remaining credits to use the model.

## Step 3: From Clicks to Code

Expand All @@ -59,11 +53,8 @@ Set your token as an environment variable:
export HF_TOKEN="your_token_here"
```

<Tip>

You can add this line to your `.bash_profile` or similar file for all your terminal environments to automatically source the token.

</Tip>
> [!TIP]
> You can add this line to your `.bash_profile` or similar file for all your terminal environments to automatically source the token.

The Python or TypeScript code snippet will use the token from the environment variable.

Expand Down
49 changes: 14 additions & 35 deletions docs/inference-providers/guides/function-calling.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@ Function calling enables language models to interact with external tools and API

When you provide a language model that has been fine-tuned to use tools with function descriptions, it can decide when to call these functions based on user requests, execute them, and incorporate the results into natural language responses. For example, you can build an assistant that fetches real-time weather data to provide accurate responses.

<Tip>

This guide assumes you have a Hugging Face account and access token. You can create a free account at [huggingface.co](https://huggingface.co) and get your token from your [settings page](https://huggingface.co/settings/tokens).

</Tip>
> [!TIP]
> This guide assumes you have a Hugging Face account and access token. You can create a free account at [huggingface.co](https://huggingface.co) and get your token from your [settings page](https://huggingface.co/settings/tokens).

## Defining Functions

Expand Down Expand Up @@ -126,11 +123,8 @@ response = client.chat.completions.create(
response_message = response.choices[0].message
```

<Tip>

The `tool_choice` parameter is used to control when the model calls functions. In this case, we're using `auto`, which means the model will decide when to call functions (0 or more times). Below we'll expand on `tool_choice` and other parameters.

</Tip>
> [!TIP]
> The `tool_choice` parameter is used to control when the model calls functions. In this case, we're using `auto`, which means the model will decide when to call functions (0 or more times). Below we'll expand on `tool_choice` and other parameters.

Next, we need to check in the model response where the model decided to call any functions. If it did, we need to execute the function and add the result to the conversation, before we send the final response to the user.

Expand Down Expand Up @@ -172,11 +166,8 @@ else:

The workflow is straightforward: make an initial API call with your tools, check if the model wants to call functions, execute them if needed, add the results to the conversation, and get the final response for the user.

<Tip warning={true}>

We have handled the case where the model wants to call a function and that the function actually exists. However, models might try to call functions that don’t exist, so we need to account for that as well. We can also deal with this using `strict` mode, which we'll cover later.

</Tip>
> [!WARNING]
> We have handled the case where the model wants to call a function and that the function actually exists. However, models might try to call functions that don’t exist, so we need to account for that as well. We can also deal with this using `strict` mode, which we'll cover later.

## Multiple Functions

Expand Down Expand Up @@ -341,11 +332,8 @@ client = InferenceClient(

By switching provider, you can see the model's response change because each provider uses a different configuration of the model.

<Tip warning={true}>

Each inference provider has different capabilities and performance characteristics. You can find more information about each provider in the [Inference Providers](/inference-providers/index#partners) section.

</Tip>
> [!WARNING]
> Each inference provider has different capabilities and performance characteristics. You can find more information about each provider in the [Inference Providers](/inference-providers/index#partners) section.

### Tool Choice Options

Expand Down Expand Up @@ -402,11 +390,8 @@ Here, we're forcing the model to call the `get_current_weather` function, and no

<hfoption id="huggingface_hub">

<Tip warning={true}>

Currently, `huggingface_hub.InferenceClient` does not support the `tool_choice` parameters that specify which function to call.

</Tip>
> [!WARNING]
> Currently, `huggingface_hub.InferenceClient` does not support the `tool_choice` parameters that specify which function to call.

</hfoption>

Expand Down Expand Up @@ -440,11 +425,8 @@ tools = [

Strict mode ensures that function arguments match your schema exactly: no additional properties are allowed, all required parameters must be provided, and data types are strictly enforced.

<Tip warning={true}>

Strict mode is not supported by all providers. You can check the provider's documentation to see if it supports strict mode.

</Tip>
> [!WARNING]
> Strict mode is not supported by all providers. You can check the provider's documentation to see if it supports strict mode.

### Streaming Responses

Expand Down Expand Up @@ -473,11 +455,8 @@ for chunk in stream:

Streaming allows you to process responses as they arrive, show real-time progress to users, and handle long-running function calls more efficiently.

<Tip warning={true}>

Streaming is not supported by all providers. You can check the provider's documentation to see if it supports streaming, or you can refer to this [dynamic model compatibility table](https://huggingface.co/inference-providers/models).

</Tip>
> [!WARNING]
> Streaming is not supported by all providers. You can check the provider's documentation to see if it supports streaming, or you can refer to this [dynamic model compatibility table](https://huggingface.co/inference-providers/models).

## Next Steps

Expand Down
14 changes: 4 additions & 10 deletions docs/inference-providers/guides/gpt-oss.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,8 @@ Both models are supported on Inference Providers and can be accessed through eit
export HF_TOKEN="your_token_here"
```

<Tip>

💡 Pro tip: The free tier gives you monthly inference credits to start building and experimenting. Upgrade to [Hugging Face PRO](https://huggingface.co/pro) for even more flexibility, $2 in monthly credits plus pay‑as‑you‑go access to all providers!

</Tip>
> [!TIP]
> 💡 Pro tip: The free tier gives you monthly inference credits to start building and experimenting. Upgrade to [Hugging Face PRO](https://huggingface.co/pro) for even more flexibility, $2 in monthly credits plus pay‑as‑you‑go access to all providers!

2. Install the official OpenAI SDK.

Expand Down Expand Up @@ -293,11 +290,8 @@ Key Advantages:
- Stateful, Event-Driven Architecture: Features a stateful, event-driven architecture. Instead of resending the entire text on every update, it streams semantic events that describe only the precise change (the "delta"). This eliminates the need for manual state tracking.
- Simplified Development for Complex Logic: The event-driven model makes it easier to build reliable applications with multi-step logic. Your code simply listens for specific events, leading to cleaner and more robust integrations.

<Tip>

The implementation is based on the open-source [huggingface/responses.js](https://github.com/huggingface/responses.js) project.

</Tip>
> [!TIP]
> The implementation is based on the open-source [huggingface/responses.js](https://github.com/huggingface/responses.js) project.

### Stream responses

Expand Down
49 changes: 17 additions & 32 deletions docs/inference-providers/guides/image-editor.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,8 @@ Our app will:
3. **Transform images** using Qwen Image Edit or FLUX.1 Kontext
4. **Display results** in a Gradio interface

<Tip>

TL;DR - this guide will show you how to build an AI image editor with Gradio and Inference Providers, just like [this one](https://huggingface.co/spaces/Qwen/Qwen-Image-Edit).

</Tip>
> [!TIP]
> TL;DR - this guide will show you how to build an AI image editor with Gradio and Inference Providers, just like [this one](https://huggingface.co/spaces/Qwen/Qwen-Image-Edit).

## Step 1: Set Up Authentication

Expand All @@ -24,11 +21,8 @@ Before we start coding, authenticate with Hugging Face using your token:
export HF_TOKEN="your_token_here"
```

<Tip>

This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).

</Tip>
> [!TIP]
> This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).

When you set this environment variable, it handles authentication automatically for all your inference calls. You can generate a token from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).

Expand All @@ -50,11 +44,8 @@ uv add huggingface-hub>=0.34.4 gradio>=5.0.0 pillow>=11.3.0

The dependencies are now installed and ready to use! Also, `uv` will maintain the `pyproject.toml` file for you as you add dependencies.

<Tip>

We're using `uv` because it's a fast Python package manager that handles dependency resolution and virtual environment management automatically. It's much faster than pip and provides better dependency resolution. If you're not familiar with `uv`, check it out [here](https://docs.astral.sh/uv/).

</Tip>
> [!TIP]
> We're using `uv` because it's a fast Python package manager that handles dependency resolution and virtual environment management automatically. It's much faster than pip and provides better dependency resolution. If you're not familiar with `uv`, check it out [here](https://docs.astral.sh/uv/).

## Step 3: Build the Core Image Editing Function

Expand Down Expand Up @@ -115,18 +106,15 @@ def edit_image(input_image, prompt):
return input_image
```

<Tip>

We're using the `fal-ai` provider with the `Qwen/Qwen-Image-Edit` model. The fal-ai provider offers fast inference times, perfect for interactive applications.

However, you can experiment with different providers for various performance characteristics:

```python
client = InferenceClient(provider="replicate", api_key=os.environ["HF_TOKEN"])
client = InferenceClient(provider="auto", api_key=os.environ["HF_TOKEN"]) # Automatic selection
```

</Tip>
> [!TIP]
> We're using the `fal-ai` provider with the `Qwen/Qwen-Image-Edit` model. The fal-ai provider offers fast inference times, perfect for interactive applications.
>
> However, you can experiment with different providers for various performance characteristics:
>
> ```python
> client = InferenceClient(provider="replicate", api_key=os.environ["HF_TOKEN"])
> client = InferenceClient(provider="auto", api_key=os.environ["HF_TOKEN"]) # Automatic selection
> ```

## Step 4: Create the Gradio Interface

Expand Down Expand Up @@ -322,11 +310,8 @@ uv export --format requirements-txt --output-file requirements.txt

This creates a `requirements.txt` file with all your project dependencies and their exact versions from the lockfile.

<Tip>

The `uv export` command ensures that your Space will use the exact same dependency versions that you tested locally, preventing deployment issues caused by version mismatches.

</Tip>
> [!TIP]
> The `uv export` command ensures that your Space will use the exact same dependency versions that you tested locally, preventing deployment issues caused by version mismatches.

Now you can deploy to Spaces:

Expand Down
14 changes: 4 additions & 10 deletions docs/inference-providers/guides/structured-output.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@ In this guide, we'll show you how to use Inference Providers to generate structu

Structured outputs guarantee a model returns a response that matches your exact schema every time. This eliminates the need for complex parsing logic and makes your applications more robust.

<Tip>

This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).

</Tip>
> [!TIP]
> This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).

## What Are Structured Outputs?

Expand Down Expand Up @@ -113,11 +110,8 @@ client = OpenAI(

</hfoptions>

<Tip>

Structured outputs are a good use case for selecting a specific provider and model because you want to avoid incompatibility issues between the model, provider and the schema.

</Tip>
> [!TIP]
> Structured outputs are a good use case for selecting a specific provider and model because you want to avoid incompatibility issues between the model, provider and the schema.

## Step 3: Generate structured output

Expand Down
7 changes: 2 additions & 5 deletions docs/inference-providers/guides/vscode.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,8 @@ Use frontier open LLMs like Kimi K2, DeepSeek V3.1, GLM 4.5 and more in VS Code
5. Enter your Hugging Face Token. You can get one from your [settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).
6. Choose the models you want to add to the model picker. 🥳

<Tip>

VS Code 1.104.0+ is required to install the HF Copilot Chat extension. If "Hugging Face" doesn't appear in the Copilot provider list, update VS Code, then reload.

</Tip>
> [!TIP]
> VS Code 1.104.0+ is required to install the HF Copilot Chat extension. If "Hugging Face" doesn't appear in the Copilot provider list, update VS Code, then reload.

## ✨ Why use the Hugging Face provider in Copilot

Expand Down
21 changes: 6 additions & 15 deletions docs/inference-providers/pricing.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,8 @@ Every Hugging Face user receives monthly credits to experiment with Inference Pr
| PRO Users | $2.00 | yes |
| Team or Enterprise Organizations | $2.00 per seat | yes |

<Tip>

Your monthly credits automatically apply when you route requests through Hugging Face. For Team or Enterprise organizations, credits are shared among all members.

</Tip>
> [!TIP]
> Your monthly credits automatically apply when you route requests through Hugging Face. For Team or Enterprise organizations, credits are shared among all members.

## How Billing Works: Choose Your Approach

Expand Down Expand Up @@ -44,11 +41,8 @@ See the [Organization Billing section](#organization-billing) below for more det
**PRO users and Enterprise Hub organizations** can continue using the API after exhausting their monthly credits. This ensures uninterrupted access to models for production workloads.


<Tip>

Hugging Face charges you the same rates as the provider, with no additional fees. We just pass through the provider costs directly.

</Tip>
> [!TIP]
> Hugging Face charges you the same rates as the provider, with no additional fees. We just pass through the provider costs directly.

You can track your spending anytime on your [billing page](https://huggingface.co/settings/billing).

Expand All @@ -67,11 +61,8 @@ Here is a table that sums up what we've seen so far:
| **Routed Requests** | Yes | Hugging Face | Yes | Only for PRO users and for integrated providers | SDKs, Playground, widgets, Data AI Studio |
| **Custom Provider Key** | Yes | Provider | No | Yes | SDKs, Playground, widgets, Data AI Studio |

<Tip>

You can set your custom provider key in the [settings page](https://huggingface.co/settings/inference-providers) on the Hub, or in the `InferenceClient` when using the JavaScript or Python SDKs. When making a routed request with a custom key, your code remains unchanged—you can still pass your Hugging Face User Access Token. Hugging Face will automatically swap the authentication when routing the request.

</Tip>
> [!TIP]
> You can set your custom provider key in the [settings page](https://huggingface.co/settings/inference-providers) on the Hub, or in the `InferenceClient` when using the JavaScript or Python SDKs. When making a routed request with a custom key, your code remains unchanged—you can still pass your Hugging Face User Access Token. Hugging Face will automatically swap the authentication when routing the request.

## HF-Inference cost

Expand Down
7 changes: 2 additions & 5 deletions docs/inference-providers/providers/cerebras.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,8 @@ For more details, check out the `generate.ts` script: https://github.com/hugging

# Cerebras

<Tip>

All supported Cerebras models can be found [here](https://huggingface.co/models?inference_provider=cerebras&sort=trending)

</Tip>
> [!TIP]
> All supported Cerebras models can be found [here](https://huggingface.co/models?inference_provider=cerebras&sort=trending)

<div class="flex justify-center">
<a href="https://www.cerebras.ai/" target="_blank">
Expand Down
Loading