Skip to content

Commit 187e683

Browse files
authored
Merge branch 'huggingface:main' into main
2 parents 8bc4778 + 276327f commit 187e683

38 files changed

+488
-142
lines changed

.github/workflows/api_inference_generate_documentation.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ on:
55
schedule:
66
- cron: "0 3 * * *" # Every day at 3am
77

8-
98
concurrency:
109
group: api_inference_generate_documentation
1110
cancel-in-progress: true
@@ -45,6 +44,7 @@ jobs:
4544
else
4645
echo "changes_detected=false" >> $GITHUB_ENV
4746
fi
47+
rm changed_files.txt
4848
4949
# Skip PR if only certain files are updated
5050
- name: Skip PR creation if no meaningful changes
@@ -68,10 +68,10 @@ jobs:
6868
pnpm update @huggingface/tasks@latest
6969
pnpm run generate
7070
```
71-
71+
7272
This PR was automatically created by the [Update API Inference Documentation workflow](https://github.com/huggingface/hub-docs/blob/main/.github/workflows/api_inference_generate_documentation.yml).
73-
73+
7474
Please review the changes before merging.
7575
reviewers: |
7676
Wauplin
77-
hanouticelina
77+
hanouticelina

docs/api-inference/_redirects.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ detailed_parameters: parameters
33
parallelism: getting_started
44
usage: getting_started
55
faq: index
6+
rate-limits: pricing

docs/api-inference/_toctree.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
title: Getting Started
66
- local: supported-models
77
title: Supported Models
8-
- local: rate-limits
9-
title: Rate Limits
8+
- local: pricing
9+
title: Pricing and Rate limits
1010
- local: security
1111
title: Security
1212
title: Getting Started

docs/api-inference/getting-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Getting Started
22

3-
The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) to make it even easier.
3+
The Serverless Inference API allows you to easily do inference on a wide range of models and tasks. You can do requests with your favorite tools (Python, cURL, etc). We also provide a Python SDK (`huggingface_hub`) and JavaScript SDK (`huggingface.js`) to make it even easier.
44

55
We'll do a minimal example using a [sentiment classification model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest). Please visit task-specific parameters and further documentation in our [API Reference](./parameters).
66

docs/api-inference/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ Explore the most popular models for text, image, speech, and more — all with a
88

99
## Why use the Inference API?
1010

11-
The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
11+
The Serverless Inference API offers a fast and simple way to explore thousands of models for a variety of tasks. Whether you're prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
1212

1313
* **Text Generation:** Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
1414
* **Image Generation:** Easily create customized images, including LoRAs for your own styles.
1515
* **Document Embeddings:** Build search and retrieval systems with SOTA embeddings.
1616
* **Classical AI Tasks:** Ready-to-use models for text classification, image classification, speech recognition, and more.
1717

18-
**Fast and Free to Get Started**: The Inference API is free with higher rate limits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.
18+
**Fast and Free to Get Started**: The Inference API is free to try out and comes with additional included credits for PRO users. For production needs, explore [Inference Endpoints](https://ui.endpoints.huggingface.co/) for dedicated resources, autoscaling, advanced security features, and more.
1919

2020
---
2121

docs/api-inference/pricing.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Pricing and Rate limits
2+
3+
As a HF user, you get monthly credits to run the HF Inference API. The amount of credits you get depends on your type of account (Free or PRO or Enterprise Hub), see table below.
4+
You get charged for every inference request, based on the compute time x price of the underlying hardware.
5+
6+
For instance, a request to [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012.
7+
8+
When your monthly included credits are depleted:
9+
- if you're a Free user, you won't be able to query the Inference API anymore,
10+
- if you're a PRO or Enterprise Hub user, you will get charged for the requests on top of your subscription. You can monitor your spending on your billing page.
11+
12+
Note that HF Inference API is not meant to be used for heavy production applications. If you need to handle large numbers of requests, consider [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to have dedicated resources or [Inference Providers](https://huggingface.co/blog/inference-providers) for serverless usage.
13+
14+
You need to be authenticated (passing a token or through your browser) to use the Inference API.
15+
16+
17+
| User Tier | Included monthly credits |
18+
|---------------------------|------------------------------------|
19+
| Free Users | subject to change, less than $0.10 |
20+
| PRO and Enterprise Users | $2.00 |

docs/api-inference/rate-limits.md

Lines changed: 0 additions & 13 deletions
This file was deleted.

docs/api-inference/supported-models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ You can find:
1010

1111
## What do I get with a PRO subscription?
1212

13-
In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [rate limits](./rate-limits) and free access to the following models:
13+
In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher [included credits](./pricing) and access to the following models:
1414

1515
<!-- Manually maintained hard-coded list based on https://github.com/huggingface-internal/api-inference/blob/main/master-rs/custom_config.yml -->
1616

@@ -27,4 +27,4 @@ This list is not exhaustive and might be updated in the future.
2727

2828
## Running Private Models
2929

30-
The free Serverless API is designed to run popular public models. If you have a private model, you can use [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to deploy it.
30+
The Serverless API is designed to run popular public models. If you have a private model, you can use [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to deploy it.

docs/api-inference/tasks/automatic-speech-recognition.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ For more details about the `automatic-speech-recognition` task, check out its [d
3030
### Recommended models
3131

3232
- [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3): A powerful ASR model by OpenAI.
33-
- [nvidia/canary-1b](https://huggingface.co/nvidia/canary-1b): A powerful multilingual ASR and Speech Translation model by Nvidia.
3433
- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1): Powerful speaker diarization model.
3534

3635
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).

docs/api-inference/tasks/chat-completion.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,15 @@ This is a subtask of [`text-generation`](https://huggingface.co/docs/api-inferen
2323

2424
- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
2525
- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
26-
- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
26+
- [microsoft/phi-4](https://huggingface.co/microsoft/phi-4): Powerful text generation model by Microsoft.
27+
- [PowerInfer/SmallThinker-3B-Preview](https://huggingface.co/PowerInfer/SmallThinker-3B-Preview): A very powerful model with reasoning capabilities.
2728
- [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct): Strong text generation model to follow instructions.
29+
- [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct): Text generation model used to write code.
2830

2931
#### Conversational Vision-Language Models (VLMs)
3032

31-
- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct): Powerful vision language model with great visual understanding and reasoning capabilities.
3233
- [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct): Strong image-text-to-text model.
34+
- [Qwen/QVQ-72B-Preview](https://huggingface.co/Qwen/QVQ-72B-Preview): Image-text-to-text model with reasoning capabilities.
3335

3436
### API Playground
3537

@@ -208,11 +210,11 @@ To use the JavaScript client, see `huggingface.js`'s [package reference](https:/
208210

209211
<curl>
210212
```bash
211-
curl 'https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-11B-Vision-Instruct/v1/chat/completions' \
213+
curl 'https://api-inference.huggingface.co/models/Qwen/Qwen2-VL-7B-Instruct/v1/chat/completions' \
212214
-H 'Authorization: Bearer hf_***' \
213215
-H 'Content-Type: application/json' \
214216
--data '{
215-
"model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
217+
"model": "Qwen/Qwen2-VL-7B-Instruct",
216218
"messages": [
217219
{
218220
"role": "user",
@@ -262,7 +264,7 @@ messages = [
262264
]
263265

264266
stream = client.chat.completions.create(
265-
model="meta-llama/Llama-3.2-11B-Vision-Instruct",
267+
model="Qwen/Qwen2-VL-7B-Instruct",
266268
messages=messages,
267269
max_tokens=500,
268270
stream=True
@@ -300,7 +302,7 @@ messages = [
300302
]
301303

302304
stream = client.chat.completions.create(
303-
model="meta-llama/Llama-3.2-11B-Vision-Instruct",
305+
model="Qwen/Qwen2-VL-7B-Instruct",
304306
messages=messages,
305307
max_tokens=500,
306308
stream=True
@@ -323,7 +325,7 @@ const client = new HfInference("hf_***");
323325
let out = "";
324326

325327
const stream = client.chatCompletionStream({
326-
model: "meta-llama/Llama-3.2-11B-Vision-Instruct",
328+
model: "Qwen/Qwen2-VL-7B-Instruct",
327329
messages: [
328330
{
329331
role: "user",
@@ -365,7 +367,7 @@ const client = new OpenAI({
365367
let out = "";
366368

367369
const stream = await client.chat.completions.create({
368-
model: "meta-llama/Llama-3.2-11B-Vision-Instruct",
370+
model: "Qwen/Qwen2-VL-7B-Instruct",
369371
messages: [
370372
{
371373
role: "user",

0 commit comments

Comments
 (0)