Skip to content

Commit 8852ae0

Browse files
authored
Merge branch 'main' into gary149-patch-1
2 parents 33a8c32 + 0bcbf66 commit 8852ae0

26 files changed

+408
-86
lines changed

docs/api-inference/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@
3030
title: Image Segmentation
3131
- local: tasks/image-to-image
3232
title: Image to Image
33+
- local: tasks/image-text-to-text
34+
title: Image-Text to Text
3335
- local: tasks/object-detection
3436
title: Object Detection
3537
- local: tasks/question-answering

docs/api-inference/tasks/audio-classification.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,9 @@ For more details about the `audio-classification` task, check out its [dedicated
2929

3030
### Recommended models
3131

32+
- [ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition](https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition): An emotion recognition model.
3233

33-
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=audio-classification&sort=trending).
34+
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=audio-classification&sort=trending).
3435

3536
### Using the API
3637

@@ -39,19 +40,18 @@ This is only a subset of the supported models. Find the model that suits you bes
3940

4041
<curl>
4142
```bash
42-
curl https://api-inference.huggingface.co/models/<REPO_ID> \
43+
curl https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition \
4344
-X POST \
4445
--data-binary '@sample1.flac' \
4546
-H "Authorization: Bearer hf_***"
46-
4747
```
4848
</curl>
4949

5050
<python>
5151
```py
5252
import requests
5353

54-
API_URL = "https://api-inference.huggingface.co/models/<REPO_ID>"
54+
API_URL = "https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
5555
headers = {"Authorization": "Bearer hf_***"}
5656

5757
def query(filename):
@@ -71,7 +71,7 @@ To use the Python client, see `huggingface_hub`'s [package reference](https://hu
7171
async function query(filename) {
7272
const data = fs.readFileSync(filename);
7373
const response = await fetch(
74-
"https://api-inference.huggingface.co/models/<REPO_ID>",
74+
"https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition",
7575
{
7676
headers: {
7777
Authorization: "Bearer hf_***"

docs/api-inference/tasks/automatic-speech-recognition.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ For more details about the `automatic-speech-recognition` task, check out its [d
3232
- [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3): A powerful ASR model by OpenAI.
3333
- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1): Powerful speaker diarization model.
3434

35-
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).
35+
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).
3636

3737
### Using the API
3838

@@ -45,7 +45,6 @@ curl https://api-inference.huggingface.co/models/openai/whisper-large-v3 \
4545
-X POST \
4646
--data-binary '@sample1.flac' \
4747
-H "Authorization: Bearer hf_***"
48-
4948
```
5049
</curl>
5150

@@ -108,7 +107,7 @@ To use the JavaScript client, see `huggingface.js`'s [package reference](https:/
108107
| **inputs*** | _string_ | The input audio data as a base64-encoded string. If no `parameters` are provided, you can also provide the audio data as a raw bytes payload. |
109108
| **parameters** | _object_ | Additional inference parameters for Automatic Speech Recognition |
110109
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return_timestamps** | _boolean_ | Whether to output corresponding timestamps with the generated text |
111-
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generate** | _object_ | Ad-hoc parametrization of the text generation process |
110+
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generation_parameters** | _object_ | Ad-hoc parametrization of the text generation process |
112111
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;temperature** | _number_ | The value used to modulate the next token probabilities. |
113112
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | The number of highest probability vocabulary tokens to keep for top-k-filtering. |
114113
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_p** | _number_ | If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. |

docs/api-inference/tasks/chat-completion.md

Lines changed: 97 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,22 @@ For more details, check out:
1414

1515
## Chat Completion
1616

17-
Generate a response given a list of messages.
18-
This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context.
19-
20-
17+
Generate a response given a list of messages in a conversational context, supporting both conversational Language Models (LLMs) and conversational Vision-Language Models (VLMs).
18+
This is a subtask of [`text-generation`](https://huggingface.co/docs/api-inference/tasks/text-generation) and [`image-text-to-text`](https://huggingface.co/docs/api-inference/tasks/image-text-to-text).
2119

2220
### Recommended models
2321

22+
#### Conversational Large Language Models (LLMs)
23+
2424
- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
2525
- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
2626
- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
2727
- [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model.
2828
- [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model.
29+
- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct): Powerful vision language model with great visual understanding and reasoning capabilities.
30+
- [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct): Strong image-text-to-text model.
2931

32+
#### Conversational Vision-Language Models (VLMs)
3033

3134
### API Playground
3235

@@ -51,6 +54,8 @@ The API supports:
5154
* Using grammars, constraints, and tools.
5255
* Streaming the output
5356

57+
#### Code snippet example for conversational LLMs
58+
5459

5560
<inferencesnippet>
5661

@@ -73,18 +78,15 @@ curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/c
7378
```py
7479
from huggingface_hub import InferenceClient
7580

76-
client = InferenceClient(
77-
"google/gemma-2-2b-it",
78-
token="hf_***",
79-
)
81+
client = InferenceClient(api_key="hf_***")
8082

8183
for message in client.chat_completion(
84+
model="google/gemma-2-2b-it",
8285
messages=[{"role": "user", "content": "What is the capital of France?"}],
8386
max_tokens=500,
8487
stream=True,
8588
):
8689
print(message.choices[0].delta.content, end="")
87-
8890
```
8991

9092
To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
@@ -103,7 +105,93 @@ for await (const chunk of inference.chatCompletionStream({
103105
})) {
104106
process.stdout.write(chunk.choices[0]?.delta?.content || "");
105107
}
108+
```
109+
110+
To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion).
111+
</js>
112+
113+
</inferencesnippet>
114+
115+
116+
117+
#### Code snippet example for conversational VLMs
106118
119+
120+
<inferencesnippet>
121+
122+
<curl>
123+
```bash
124+
curl 'https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-11B-Vision-Instruct/v1/chat/completions' \
125+
-H "Authorization: Bearer hf_***" \
126+
-H 'Content-Type: application/json' \
127+
-d '{
128+
"model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
129+
"messages": [
130+
{
131+
"role": "user",
132+
"content": [
133+
{"type": "image_url", "image_url": {"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}},
134+
{"type": "text", "text": "Describe this image in one sentence."}
135+
]
136+
}
137+
],
138+
"max_tokens": 500,
139+
"stream": false
140+
}'
141+
142+
```
143+
</curl>
144+
145+
<python>
146+
```py
147+
from huggingface_hub import InferenceClient
148+
149+
client = InferenceClient(api_key="hf_***")
150+
151+
image_url = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
152+
153+
for message in client.chat_completion(
154+
model="meta-llama/Llama-3.2-11B-Vision-Instruct",
155+
messages=[
156+
{
157+
"role": "user",
158+
"content": [
159+
{"type": "image_url", "image_url": {"url": image_url}},
160+
{"type": "text", "text": "Describe this image in one sentence."},
161+
],
162+
}
163+
],
164+
max_tokens=500,
165+
stream=True,
166+
):
167+
print(message.choices[0].delta.content, end="")
168+
```
169+
170+
To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
171+
</python>
172+
173+
<js>
174+
```js
175+
import { HfInference } from "@huggingface/inference";
176+
177+
const inference = new HfInference("hf_***");
178+
const imageUrl = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg";
179+
180+
for await (const chunk of inference.chatCompletionStream({
181+
model: "meta-llama/Llama-3.2-11B-Vision-Instruct",
182+
messages: [
183+
{
184+
"role": "user",
185+
"content": [
186+
{"type": "image_url", "image_url": {"url": imageUrl}},
187+
{"type": "text", "text": "Describe this image in one sentence."},
188+
],
189+
}
190+
],
191+
max_tokens: 500,
192+
})) {
193+
process.stdout.write(chunk.choices[0]?.delta?.content || "");
194+
}
107195
```
108196
109197
To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion).

docs/api-inference/tasks/feature-extraction.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ For more details about the `feature-extraction` task, check out its [dedicated p
3131

3232
- [thenlper/gte-large](https://huggingface.co/thenlper/gte-large): A powerful feature extraction model for natural language processing tasks.
3333

34-
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=feature-extraction&sort=trending).
34+
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=feature-extraction&sort=trending).
3535

3636
### Using the API
3737

@@ -45,7 +45,6 @@ curl https://api-inference.huggingface.co/models/thenlper/gte-large \
4545
-d '{"inputs": "Today is a sunny day and I will get some ice cream."}' \
4646
-H 'Content-Type: application/json' \
4747
-H "Authorization: Bearer hf_***"
48-
4948
```
5049
</curl>
5150

docs/api-inference/tasks/fill-mask.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ For more details about the `fill-mask` task, check out its [dedicated page](http
2727
- [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased): The famous BERT model.
2828
- [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base): A multilingual model trained on 100 languages.
2929

30-
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending).
30+
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending).
3131

3232
### Using the API
3333

@@ -41,7 +41,6 @@ curl https://api-inference.huggingface.co/models/google-bert/bert-base-uncased \
4141
-d '{"inputs": "The answer to the universe is [MASK]."}' \
4242
-H 'Content-Type: application/json' \
4343
-H "Authorization: Bearer hf_***"
44-
4544
```
4645
</curl>
4746

docs/api-inference/tasks/image-classification.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,9 @@ For more details about the `image-classification` task, check out its [dedicated
2525
### Recommended models
2626

2727
- [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224): A strong image classification model.
28+
- [facebook/deit-base-distilled-patch16-224](https://huggingface.co/facebook/deit-base-distilled-patch16-224): A robust image classification model.
2829

29-
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-classification&sort=trending).
30+
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-classification&sort=trending).
3031

3132
### Using the API
3233

@@ -39,7 +40,6 @@ curl https://api-inference.huggingface.co/models/google/vit-base-patch16-224 \
3940
-X POST \
4041
--data-binary '@cats.jpg' \
4142
-H "Authorization: Bearer hf_***"
42-
4343
```
4444
</curl>
4545

docs/api-inference/tasks/image-segmentation.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ For more details about the `image-segmentation` task, check out its [dedicated p
2626

2727
- [nvidia/segformer-b0-finetuned-ade-512-512](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512): Semantic segmentation model trained on ADE20k benchmark dataset with 512x512 resolution.
2828

29-
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-segmentation&sort=trending).
29+
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-segmentation&sort=trending).
3030

3131
### Using the API
3232

@@ -39,7 +39,6 @@ curl https://api-inference.huggingface.co/models/nvidia/segformer-b0-finetuned-a
3939
-X POST \
4040
--data-binary '@cats.jpg' \
4141
-H "Authorization: Bearer hf_***"
42-
4342
```
4443
</curl>
4544

0 commit comments

Comments
 (0)