You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/workflows/api_inference_generate_documentation.yml
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,6 @@ on:
5
5
schedule:
6
6
- cron: "0 3 * * *"# Every day at 3am
7
7
8
-
9
8
concurrency:
10
9
group: api_inference_generate_documentation
11
10
cancel-in-progress: true
@@ -45,6 +44,7 @@ jobs:
45
44
else
46
45
echo "changes_detected=false" >> $GITHUB_ENV
47
46
fi
47
+
rm changed_files.txt
48
48
49
49
# Skip PR if only certain files are updated
50
50
- name: Skip PR creation if no meaningful changes
@@ -68,10 +68,10 @@ jobs:
68
68
pnpm update @huggingface/tasks@latest
69
69
pnpm run generate
70
70
```
71
-
71
+
72
72
This PR was automatically created by the [Update API Inference Documentation workflow](https://github.com/huggingface/hub-docs/blob/main/.github/workflows/api_inference_generate_documentation.yml).
Copy file name to clipboardExpand all lines: docs/api-inference/getting-started.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Getting Started
2
2
3
-
The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) to make it even easier.
3
+
The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) and JavaScript SDK (`huggingface.js`) to make it even easier.
4
4
5
5
We'll do a minimal example using a [sentiment classification model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest). Please visit task-specific parameters and further documentation in our [API Reference](./parameters).
Copy file name to clipboardExpand all lines: docs/api-inference/index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,14 +8,14 @@ Explore the most popular models for text, image, speech, and more — all with a
8
8
9
9
## Why use the Inference API?
10
10
11
-
The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
11
+
The Serverless Inference API offers a fast and simple way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
12
12
13
13
***Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
14
14
***Image Generation:** Easily create customized images, including LoRAs for your own styles.
15
15
***Document Embeddings:** Build search and retrieval systems with SOTA embeddings.
16
16
***Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more.
17
17
18
-
⚡ **Fast and Free to Get Started**: The Inference API is free with higher rate limits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.
18
+
⚡ **Fast and Free to Get Started**: The Inference API is free to try out and comes with additional included credits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.
As a HF user, you get monthly credits to run the HF Inference API. The amount of credits you get depends on your type of account (Free or PRO or Enterprise Hub), see table below.
4
+
You get charged for every inference request, based on the compute time x price of the underlying hardware.
5
+
6
+
For instance, a request to [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012.
7
+
8
+
When your monthly included credits are depleted:
9
+
- if you're a Free user, you won't be able to query the Inference API anymore,
10
+
- if you're a PRO or Enterprise Hub user, you will get charged for the requests on top of your subscription. You can monitor your spending on your billing page.
11
+
12
+
Note that HF Inference API is not meant to be used for heavy production applications. If you need to handle large numbers of requests, consider [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to have dedicated resources or [Inference Providers](https://huggingface.co/blog/inference-providers) for serverless usage.
13
+
14
+
You need to be authenticated (passing a token or through your browser) to use the Inference API.
Copy file name to clipboardExpand all lines: docs/api-inference/supported-models.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ You can find:
10
10
11
11
## What do I get with a PRO subscription?
12
12
13
-
In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [rate limits](./rate-limits) and free access to the following models:
13
+
In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [included credits](./pricing) and access to the following models:
14
14
15
15
<!-- Manually maintained hard-coded list based on https://github.com/huggingface-internal/api-inference/blob/main/master-rs/custom_config.yml -->
16
16
@@ -27,4 +27,4 @@ This list is not exhaustive and might be updated in the future.
27
27
28
28
## Running Private Models
29
29
30
-
The free Serverless API is designed to run popular public models. If you have a private model, you can use [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to deploy it.
30
+
The Serverless API is designed to run popular public models. If you have a private model, you can use [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to deploy it.
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).
Copy file name to clipboardExpand all lines: docs/api-inference/tasks/chat-completion.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,13 +23,15 @@ This is a subtask of [`text-generation`](https://huggingface.co/docs/api-inferen
23
23
24
24
-[google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
25
25
-[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
26
-
-[microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
26
+
-[microsoft/phi-4](https://huggingface.co/microsoft/phi-4): Powerful text generation model by Microsoft.
27
+
-[PowerInfer/SmallThinker-3B-Preview](https://huggingface.co/PowerInfer/SmallThinker-3B-Preview): A very powerful model with reasoning capabilities.
27
28
-[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct): Strong text generation model to follow instructions.
29
+
-[Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct): Text generation model used to write code.
28
30
29
31
#### Conversational Vision-Language Models (VLMs)
30
32
31
-
-[meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct): Powerful vision language model with great visual understanding and reasoning capabilities.
0 commit comments