You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we deploy our app to Hugging Face Spaces, we'll need to add our token as a secret. This is a secure way to handle the token and avoid exposing it in the code.
61
-
62
-
</Tip>
58
+
> [!WARNING]
59
+
> When we deploy our app to Hugging Face Spaces, we'll need to add our token as a secret. This is a secure way to handle the token and avoid exposing it in the code.
63
60
64
61
</hfoption>
65
62
</hfoptions>
@@ -179,11 +176,8 @@ We'll also need to implement the `transcribe` and `summarize` functions.
179
176
180
177
Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing.
181
178
182
-
<Tip>
183
-
184
-
We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.
185
-
186
-
</Tip>
179
+
> [!TIP]
180
+
> We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.
Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing.
207
201
208
-
<Tip>
209
-
210
-
We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.
211
-
212
-
</Tip>
202
+
> [!TIP]
203
+
> We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.
Copy file name to clipboardExpand all lines: docs/inference-providers/guides/function-calling.md
+14-35Lines changed: 14 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,8 @@ Function calling enables language models to interact with external tools and API
4
4
5
5
When you provide a language model that has been fine-tuned to use tools with function descriptions, it can decide when to call these functions based on user requests, execute them, and incorporate the results into natural language responses. For example, you can build an assistant that fetches real-time weather data to provide accurate responses.
6
6
7
-
<Tip>
8
-
9
-
This guide assumes you have a Hugging Face account and access token. You can create a free account at [huggingface.co](https://huggingface.co) and get your token from your [settings page](https://huggingface.co/settings/tokens).
10
-
11
-
</Tip>
7
+
> [!TIP]
8
+
> This guide assumes you have a Hugging Face account and access token. You can create a free account at [huggingface.co](https://huggingface.co) and get your token from your [settings page](https://huggingface.co/settings/tokens).
The `tool_choice` parameter is used to control when the model calls functions. In this case, we're using `auto`, which means the model will decide when to call functions (0 or more times). Below we'll expand on `tool_choice` and other parameters.
132
-
133
-
</Tip>
126
+
> [!TIP]
127
+
> The `tool_choice` parameter is used to control when the model calls functions. In this case, we're using `auto`, which means the model will decide when to call functions (0 or more times). Below we'll expand on `tool_choice` and other parameters.
134
128
135
129
Next, we need to check in the model response where the model decided to call any functions. If it did, we need to execute the function and add the result to the conversation, before we send the final response to the user.
136
130
@@ -172,11 +166,8 @@ else:
172
166
173
167
The workflow is straightforward: make an initial API call with your tools, check if the model wants to call functions, execute them if needed, add the results to the conversation, and get the final response for the user.
174
168
175
-
<Tipwarning={true}>
176
-
177
-
We have handled the case where the model wants to call a function and that the function actually exists. However, models might try to call functions that don’t exist, so we need to account for that as well. We can also deal with this using `strict` mode, which we'll cover later.
178
-
179
-
</Tip>
169
+
> [!WARNING]
170
+
> We have handled the case where the model wants to call a function and that the function actually exists. However, models might try to call functions that don’t exist, so we need to account for that as well. We can also deal with this using `strict` mode, which we'll cover later.
180
171
181
172
## Multiple Functions
182
173
@@ -341,11 +332,8 @@ client = InferenceClient(
341
332
342
333
By switching provider, you can see the model's response change because each provider uses a different configuration of the model.
343
334
344
-
<Tipwarning={true}>
345
-
346
-
Each inference provider has different capabilities and performance characteristics. You can find more information about each provider in the [Inference Providers](/inference-providers/index#partners) section.
347
-
348
-
</Tip>
335
+
> [!WARNING]
336
+
> Each inference provider has different capabilities and performance characteristics. You can find more information about each provider in the [Inference Providers](/inference-providers/index#partners) section.
349
337
350
338
### Tool Choice Options
351
339
@@ -402,11 +390,8 @@ Here, we're forcing the model to call the `get_current_weather` function, and no
402
390
403
391
<hfoptionid="huggingface_hub">
404
392
405
-
<Tipwarning={true}>
406
-
407
-
Currently, `huggingface_hub.InferenceClient` does not support the `tool_choice` parameters that specify which function to call.
408
-
409
-
</Tip>
393
+
> [!WARNING]
394
+
> Currently, `huggingface_hub.InferenceClient` does not support the `tool_choice` parameters that specify which function to call.
410
395
411
396
</hfoption>
412
397
@@ -440,11 +425,8 @@ tools = [
440
425
441
426
Strict mode ensures that function arguments match your schema exactly: no additional properties are allowed, all required parameters must be provided, and data types are strictly enforced.
442
427
443
-
<Tipwarning={true}>
444
-
445
-
Strict mode is not supported by all providers. You can check the provider's documentation to see if it supports strict mode.
446
-
447
-
</Tip>
428
+
> [!WARNING]
429
+
> Strict mode is not supported by all providers. You can check the provider's documentation to see if it supports strict mode.
448
430
449
431
### Streaming Responses
450
432
@@ -473,11 +455,8 @@ for chunk in stream:
473
455
474
456
Streaming allows you to process responses as they arrive, show real-time progress to users, and handle long-running function calls more efficiently.
475
457
476
-
<Tipwarning={true}>
477
-
478
-
Streaming is not supported by all providers. You can check the provider's documentation to see if it supports streaming, or you can refer to this [dynamic model compatibility table](https://huggingface.co/inference-providers/models).
479
-
480
-
</Tip>
458
+
> [!WARNING]
459
+
> Streaming is not supported by all providers. You can check the provider's documentation to see if it supports streaming, or you can refer to this [dynamic model compatibility table](https://huggingface.co/inference-providers/models).
Copy file name to clipboardExpand all lines: docs/inference-providers/guides/gpt-oss.md
+4-10Lines changed: 4 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,11 +16,8 @@ Both models are supported on Inference Providers and can be accessed through eit
16
16
export HF_TOKEN="your_token_here"
17
17
```
18
18
19
-
<Tip>
20
-
21
-
💡 Pro tip: The free tier gives you monthly inference credits to start building and experimenting. Upgrade to [Hugging Face PRO](https://huggingface.co/pro) for even more flexibility, $2 in monthly credits plus pay‑as‑you‑go access to all providers!
22
-
23
-
</Tip>
19
+
> [!TIP]
20
+
> 💡 Pro tip: The free tier gives you monthly inference credits to start building and experimenting. Upgrade to [Hugging Face PRO](https://huggingface.co/pro) for even more flexibility, $2 in monthly credits plus pay‑as‑you‑go access to all providers!
24
21
25
22
2. Install the official OpenAI SDK.
26
23
@@ -293,11 +290,8 @@ Key Advantages:
293
290
- Stateful, Event-Driven Architecture: Features a stateful, event-driven architecture. Instead of resending the entire text on every update, it streams semantic events that describe only the precise change (the "delta"). This eliminates the need for manual state tracking.
294
291
- Simplified Development for Complex Logic: The event-driven model makes it easier to build reliable applications with multi-step logic. Your code simply listens for specific events, leading to cleaner and more robust integrations.
295
292
296
-
<Tip>
297
-
298
-
The implementation is based on the open-source [huggingface/responses.js](https://github.com/huggingface/responses.js) project.
299
-
300
-
</Tip>
293
+
> [!TIP]
294
+
> The implementation is based on the open-source [huggingface/responses.js](https://github.com/huggingface/responses.js) project.
Copy file name to clipboardExpand all lines: docs/inference-providers/guides/image-editor.md
+17-32Lines changed: 17 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,11 +9,8 @@ Our app will:
9
9
3.**Transform images** using Qwen Image Edit or FLUX.1 Kontext
10
10
4.**Display results** in a Gradio interface
11
11
12
-
<Tip>
13
-
14
-
TL;DR - this guide will show you how to build an AI image editor with Gradio and Inference Providers, just like [this one](https://huggingface.co/spaces/Qwen/Qwen-Image-Edit).
15
-
16
-
</Tip>
12
+
> [!TIP]
13
+
> TL;DR - this guide will show you how to build an AI image editor with Gradio and Inference Providers, just like [this one](https://huggingface.co/spaces/Qwen/Qwen-Image-Edit).
17
14
18
15
## Step 1: Set Up Authentication
19
16
@@ -24,11 +21,8 @@ Before we start coding, authenticate with Hugging Face using your token:
24
21
export HF_TOKEN="your_token_here"
25
22
```
26
23
27
-
<Tip>
28
-
29
-
This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).
30
-
31
-
</Tip>
24
+
> [!TIP]
25
+
> This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).
32
26
33
27
When you set this environment variable, it handles authentication automatically for all your inference calls. You can generate a token from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).
The dependencies are now installed and ready to use! Also, `uv` will maintain the `pyproject.toml` file for you as you add dependencies.
52
46
53
-
<Tip>
54
-
55
-
We're using `uv` because it's a fast Python package manager that handles dependency resolution and virtual environment management automatically. It's much faster than pip and provides better dependency resolution. If you're not familiar with `uv`, check it out [here](https://docs.astral.sh/uv/).
56
-
57
-
</Tip>
47
+
> [!TIP]
48
+
> We're using `uv` because it's a fast Python package manager that handles dependency resolution and virtual environment management automatically. It's much faster than pip and provides better dependency resolution. If you're not familiar with `uv`, check it out [here](https://docs.astral.sh/uv/).
We're using the `fal-ai` provider with the `Qwen/Qwen-Image-Edit` model. The fal-ai provider offers fast inference times, perfect for interactive applications.
121
-
122
-
However, you can experiment with different providers for various performance characteristics:
> We're using the `fal-ai` provider with the `Qwen/Qwen-Image-Edit` model. The fal-ai provider offers fast inference times, perfect for interactive applications.
111
+
>
112
+
> However, you can experiment with different providers for various performance characteristics:
This creates a `requirements.txt` file with all your project dependencies and their exact versions from the lockfile.
324
312
325
-
<Tip>
326
-
327
-
The `uv export` command ensures that your Space will use the exact same dependency versions that you tested locally, preventing deployment issues caused by version mismatches.
328
-
329
-
</Tip>
313
+
> [!TIP]
314
+
> The `uv export` command ensures that your Space will use the exact same dependency versions that you tested locally, preventing deployment issues caused by version mismatches.
Copy file name to clipboardExpand all lines: docs/inference-providers/guides/structured-output.md
+4-10Lines changed: 4 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,8 @@ In this guide, we'll show you how to use Inference Providers to generate structu
4
4
5
5
Structured outputs guarantee a model returns a response that matches your exact schema every time. This eliminates the need for complex parsing logic and makes your applications more robust.
6
6
7
-
<Tip>
8
-
9
-
This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).
10
-
11
-
</Tip>
7
+
> [!TIP]
8
+
> This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).
12
9
13
10
## What Are Structured Outputs?
14
11
@@ -113,11 +110,8 @@ client = OpenAI(
113
110
114
111
</hfoptions>
115
112
116
-
<Tip>
117
-
118
-
Structured outputs are a good use case for selecting a specific provider and model because you want to avoid incompatibility issues between the model, provider and the schema.
119
-
120
-
</Tip>
113
+
> [!TIP]
114
+
> Structured outputs are a good use case for selecting a specific provider and model because you want to avoid incompatibility issues between the model, provider and the schema.
Copy file name to clipboardExpand all lines: docs/inference-providers/guides/vscode.md
+2-5Lines changed: 2 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,11 +13,8 @@ Use frontier open LLMs like Kimi K2, DeepSeek V3.1, GLM 4.5 and more in VS Code
13
13
5. Enter your Hugging Face Token. You can get one from your [settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).
14
14
6. Choose the models you want to add to the model picker. 🥳
15
15
16
-
<Tip>
17
-
18
-
VS Code 1.104.0+ is required to install the HF Copilot Chat extension. If "Hugging Face" doesn't appear in the Copilot provider list, update VS Code, then reload.
19
-
20
-
</Tip>
16
+
> [!TIP]
17
+
> VS Code 1.104.0+ is required to install the HF Copilot Chat extension. If "Hugging Face" doesn't appear in the Copilot provider list, update VS Code, then reload.
Copy file name to clipboardExpand all lines: docs/inference-providers/pricing.md
+6-15Lines changed: 6 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,11 +12,8 @@ Every Hugging Face user receives monthly credits to experiment with Inference Pr
12
12
| PRO Users | $2.00 | yes |
13
13
| Team or Enterprise Organizations | $2.00 per seat | yes |
14
14
15
-
<Tip>
16
-
17
-
Your monthly credits automatically apply when you route requests through Hugging Face. For Team or Enterprise organizations, credits are shared among all members.
18
-
19
-
</Tip>
15
+
> [!TIP]
16
+
> Your monthly credits automatically apply when you route requests through Hugging Face. For Team or Enterprise organizations, credits are shared among all members.
20
17
21
18
## How Billing Works: Choose Your Approach
22
19
@@ -44,11 +41,8 @@ See the [Organization Billing section](#organization-billing) below for more det
44
41
**PRO users and Enterprise Hub organizations** can continue using the API after exhausting their monthly credits. This ensures uninterrupted access to models for production workloads.
45
42
46
43
47
-
<Tip>
48
-
49
-
Hugging Face charges you the same rates as the provider, with no additional fees. We just pass through the provider costs directly.
50
-
51
-
</Tip>
44
+
> [!TIP]
45
+
> Hugging Face charges you the same rates as the provider, with no additional fees. We just pass through the provider costs directly.
52
46
53
47
You can track your spending anytime on your [billing page](https://huggingface.co/settings/billing).
54
48
@@ -67,11 +61,8 @@ Here is a table that sums up what we've seen so far:
67
61
|**Routed Requests**| Yes | Hugging Face | Yes | Only for PRO users and for integrated providers | SDKs, Playground, widgets, Data AI Studio |
68
62
|**Custom Provider Key**| Yes | Provider | No | Yes | SDKs, Playground, widgets, Data AI Studio |
69
63
70
-
<Tip>
71
-
72
-
You can set your custom provider key in the [settings page](https://huggingface.co/settings/inference-providers) on the Hub, or in the `InferenceClient` when using the JavaScript or Python SDKs. When making a routed request with a custom key, your code remains unchanged—you can still pass your Hugging Face User Access Token. Hugging Face will automatically swap the authentication when routing the request.
73
-
74
-
</Tip>
64
+
> [!TIP]
65
+
> You can set your custom provider key in the [settings page](https://huggingface.co/settings/inference-providers) on the Hub, or in the `InferenceClient` when using the JavaScript or Python SDKs. When making a routed request with a custom key, your code remains unchanged—you can still pass your Hugging Face User Access Token. Hugging Face will automatically swap the authentication when routing the request.
0 commit comments