You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api-inference/hub-api.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Hub API
2
2
3
-
The Hub provides a few API to deal with Inference Providers. Here is a list of them.
3
+
The Hub provides a few APIs to interact with Inference Providers. Here is a list of them:
4
4
5
5
## List models
6
6
@@ -16,7 +16,7 @@ To list models powered by a provider, use the `inference_provider` query paramet
16
16
...
17
17
```
18
18
19
-
It can be combined with other filters to e.g. select only text-to-image models:
19
+
It can be combined with other filters to e.g. select only `text-to-image` models:
20
20
21
21
```sh
22
22
# List text-to-image models served by Fal AI
@@ -28,7 +28,7 @@ It can be combined with other filters to e.g. select only text-to-image models:
28
28
...
29
29
```
30
30
31
-
Pass a comma-separated list to select from multiple providers:
31
+
Pass a comma-separated list of providers to select multiple:
32
32
33
33
```sh
34
34
# List image-text-to-text models served by Novita or Sambanova
@@ -54,7 +54,7 @@ Finally, you can select all models served by at least one inference provider:
54
54
55
55
## Get model status
56
56
57
-
If you are interested by a specific model and want to check if at least 1 provider serves it, you can request the `inference` attribute in the model info endpoint:
57
+
To find an inference provider for a specific model, request the `inference` attribute in the model info endpoint:
58
58
59
59
<inferencesnippet>
60
60
@@ -170,4 +170,4 @@ In the `huggingface_hub`, use `model_info` with the expand parameter:
170
170
</inferencesnippet>
171
171
172
172
173
-
For each provider, you get the status (`staging` or `live`), the related task (here, `conversational`) and the providerId. In practice, this information is mostly relevant for the JS and Python clients. The relevant part is to know that the listed providers are the ones serving the model.
173
+
Each provider serving the model shows a status (`staging` or `live`), the related task (here, `conversational`) and the providerId. In practice, this information is relevant for the JS and Python clients.
Copy file name to clipboardExpand all lines: docs/api-inference/hub-integration.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,17 @@
1
1
# Hub Integration
2
2
3
-
The Inference Providers is tightly integrated with the Hugging Face Hub. No matter in which service you use it, the usage and billing will be centralized on your Hugging Face account.
3
+
The Inference Providers is tightly integrated with the Hugging Face Hub. No matter which provider you use, the usage and billing will be centralized in your Hugging Face account.
4
4
5
5
## Model search
6
6
7
-
When listing models on the Hub, you can filter to select models deployed on the inference provider for your choice. For example, to list all models deployed on Fireworks AI infra: https://huggingface.co/models?inference_provider=fireworks-ai.
7
+
When listing models on the Hub, you can filter to select models deployed on the inference provider of your choice. For example, to list all models deployed on Fireworks AI infra: https://huggingface.co/models?inference_provider=fireworks-ai.
It is also possible to select multiple providers or even all of them to filter all models that are available on at least 1 provider: https://huggingface.co/models?inference_provider=all.
14
+
It is also possible to select all or multiple providers and filter their available models: https://huggingface.co/models?inference_provider=all.
@@ -20,7 +20,7 @@ It is also possible to select multiple providers or even all of them to filter a
20
20
21
21
## Features using Inference Providers
22
22
23
-
Several Hugging Face features utilize the Inference Providers and count towards your monthly credits. The included monthly credits for PRO and Enterprise should cover moderate usage of these features for most users.
23
+
Several Hugging Face features utilize Inference Providers and count towards your monthly credits. The included monthly credits for PRO and Enterprise should cover moderate usage of these features for most users.
24
24
25
25
-[Inference Widgets](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324): Interactive widgets available on model pages. This is the entry point to quickly test a model on the Hub.
Copy file name to clipboardExpand all lines: docs/api-inference/index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ To get started quickly with [Chat Completion models](http://huggingface.co/model
35
35
36
36
You can call the Inference Providers with your preferred tools, such as Python, JavaScript, or cURL. To simplify integration, we offer both a Python SDK (`huggingface_hub`) and a JavaScript SDK (`huggingface.js`).
37
37
38
-
In this section, we will demonstrate a simple example using [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), a conversational Large Language Model. For the example, we will use [Novita AI](https://novita.ai/) as Inference Provider with routed requests. You will learn what that means in the next chapters.
38
+
In this section, we will demonstrate a simple example using [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324), a conversational Large Language Model. For the example, we will use [Novita AI](https://novita.ai/) as Inference Provider.
Copy file name to clipboardExpand all lines: docs/api-inference/pricing.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Pricing and Billing
2
2
3
-
The Inference Providers is a production-ready service involving external partners and is therefore a paid-product. However, as an Hugging Face user you get monthly credits to run experiments. The amount of credits you get depends on your type of account:
3
+
Inference Providers is a production-ready service involving external partners and is therefore a paid-product. However, as a Hugging Face user you get monthly credits to run experiments. The amount of credits you get depends on your type of account:
@@ -11,7 +11,7 @@ The Inference Providers is a production-ready service involving external partner
11
11
12
12
**PRO and Enterprise Hub users** can continue using the API once their monthly included credits are exhausted. This billing model, known as "Pay-as-you-Go" (PAYG), is charged on top of the monthly subscription. PAYG is only available for providers that are integrated with our billing system. We're actively working to integrate all providers, but in the meantime, any providers that are not yet integrated will be blocked once the free-tier limit is reached.
13
13
14
-
If you haven't used up your included credits yet, we estimate costs for providers that aren’t fully integrated with our billing system. These estimates are usually higher than the actual cost to prevent abuse, which is why PAYG is currently disabled for those providers.
14
+
If you have remaining credits, we estimate costs for providers that aren’t fully integrated with our billing system. These estimates are usually higher than the actual cost to prevent abuse, which is why PAYG is currently disabled for those providers.
15
15
16
16
You can track your spending on your [billing page](https://huggingface.co/settings/billing).
17
17
@@ -25,7 +25,7 @@ Hugging Face charges you the same rates as the provider, with no additional fees
25
25
26
26
The documentation above assumes you are making routed requests to external providers. In practice, there are 3 different ways to run inference, each with unique billing implications:
27
27
28
-
-**Routed Request**: This is the default method for using the Inference Providers. Simply use the JavaScript or Python `InferenceClient`, or make raw HTTP requests with your Hugging Face User Access Token. Your request is automatically routed through Hugging Face to the provider's platform. No separate provider account is required, and billing is managed directly by Hugging Face. This approach lets you seamlessly switch between providers without additional setup.
28
+
-**Routed Request**: This is the default method for using Inference Providers. Simply use the JavaScript or Python `InferenceClient`, or make raw HTTP requests with your Hugging Face User Access Token. Your request is automatically routed through Hugging Face to the provider's platform. No separate provider account is required, and billing is managed directly by Hugging Face. This approach lets you seamlessly switch between providers without additional setup.
29
29
30
30
-**Routed Request with Custom Key**: In your [settings page](https://huggingface.co/settings/inference-providers) on the Hub, you can configure a custom key for each provider. To use this option, you'll need to create an account on the provider's platform, and billing will be handled directly by that provider. Hugging Face won't charge you for the call. This method gives you more control over billing when experimenting with models on the Hub. When making a routed request with a custom key, your code remains unchanged—you'll still pass your Hugging Face User Access Token. Hugging Face will automatically swap the authentication when routing the request.
31
31
@@ -41,15 +41,15 @@ Here is a table that sums up what we've seen so far:
41
41
42
42
## HF-Inference cost
43
43
44
-
As you may have noticed, you can select to work with `"hf-inference"` provider. This is what used to be the "Inference API (serverless)" prior to the Inference Providers integration. From a user point of view, working with HF Inference is the same as with any other providers. Past the free-tier credits, you get charged for every inference request based on the compute time x price of the underlying hardware.
44
+
As you may have noticed, you can select to work with `"hf-inference"` provider. This service used to be "Inference API (serverless)" prior to Inference Providers. From a user point of view, working with HF Inference is the same as with any other provider. Past the free-tier credits, you get charged for every inference request based on the compute time x price of the underlying hardware.
45
45
46
46
For instance, a request to [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012.
47
47
48
48
The `"hf-inference"` provider is currently the default provider when working with the JavaScript and Python SDKs. Note that this default might change in the future.
49
49
50
50
## Organization billing
51
51
52
-
For Enterprise Hub organizations, it is possible to centralize billing for all your users. Each user still use their own User Access Token but the requests are billed to your organization. This can be done by passing `"X-HF-Bill-To: my-org-name"` as header in your HTTP requests.
52
+
For Enterprise Hub organizations, it is possible to centralize billing for all your users. Each user still uses their own User Access Token but the requests are billed to your organization. This can be done by passing `"X-HF-Bill-To: my-org-name"` as header in your HTTP requests.
53
53
54
54
If you are using the JavaScript `InferenceClient`, you can set the `billTo` attribute at a client level:
0 commit comments