huggingface
diff --git a/‎docs/api-inference/_toctree.yml‎
Lines changed: 2 additions & 0 deletions b/‎docs/api-inference/_toctree.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/api-inference/tasks/audio-classification.md‎
Lines changed: 5 additions & 5 deletions b/‎docs/api-inference/tasks/audio-classification.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/api-inference/tasks/automatic-speech-recognition.md‎
Lines changed: 2 additions & 3 deletions b/‎docs/api-inference/tasks/automatic-speech-recognition.md‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎docs/api-inference/tasks/chat-completion.md‎
Lines changed: 97 additions & 9 deletions b/‎docs/api-inference/tasks/chat-completion.md‎
Lines changed: 97 additions & 9 deletions
diff --git a/‎docs/api-inference/tasks/feature-extraction.md‎
Lines changed: 1 addition & 2 deletions b/‎docs/api-inference/tasks/feature-extraction.md‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎docs/api-inference/tasks/fill-mask.md‎
Lines changed: 1 addition & 2 deletions b/‎docs/api-inference/tasks/fill-mask.md‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎docs/api-inference/tasks/image-classification.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/api-inference/tasks/image-classification.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/api-inference/tasks/image-segmentation.md‎
Lines changed: 1 addition & 2 deletions b/‎docs/api-inference/tasks/image-segmentation.md‎
Lines changed: 1 addition & 2 deletions
@@ -30,6 +30,8 @@
       title: Image Segmentation
     - local: tasks/image-to-image
       title: Image to Image
+    - local: tasks/image-text-to-text
+      title: Image-Text to Text
     - local: tasks/object-detection
       title: Object Detection
     - local: tasks/question-answering
 
@@ -29,8 +29,9 @@ For more details about the `audio-classification` task, check out its [dedicated
 
 ### Recommended models
 
+- [ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition](https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition): An emotion recognition model.
 
-This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=audio-classification&sort=trending).
+Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=audio-classification&sort=trending).
 
 ### Using the API
 
@@ -39,19 +40,18 @@ This is only a subset of the supported models. Find the model that suits you bes
 
 <curl>
 ```bash
-curl https://api-inference.huggingface.co/models/<REPO_ID> \
+curl https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition \
 	-X POST \
 	--data-binary '@sample1.flac' \
 	-H "Authorization: Bearer hf_***"
-
 ```
 </curl>
 
 <python>
 ```py
 import requests
 
-API_URL = "https://api-inference.huggingface.co/models/<REPO_ID>"
+API_URL = "https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
 headers = {"Authorization": "Bearer hf_***"}
 
 def query(filename):
@@ -71,7 +71,7 @@ To use the Python client, see `huggingface_hub`'s [package reference](https://hu
 async function query(filename) {
 	const data = fs.readFileSync(filename);
 	const response = await fetch(
-		"https://api-inference.huggingface.co/models/<REPO_ID>",
+		"https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition",
 		{
 			headers: {
 				Authorization: "Bearer hf_***"
 
@@ -32,7 +32,7 @@ For more details about the `automatic-speech-recognition` task, check out its [d
 - [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3): A powerful ASR model by OpenAI.
 - [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1): Powerful speaker diarization model.
 
-This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).
+Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).
 
 ### Using the API
 
@@ -45,7 +45,6 @@ curl https://api-inference.huggingface.co/models/openai/whisper-large-v3 \
 	-X POST \
 	--data-binary '@sample1.flac' \
 	-H "Authorization: Bearer hf_***"
-
 ```
 </curl>
 
@@ -108,7 +107,7 @@ To use the JavaScript client, see `huggingface.js`'s [package reference](https:/
 | **inputs*** | _string_ | The input audio data as a base64-encoded string. If no `parameters` are provided, you can also provide the audio data as a raw bytes payload. |
 | **parameters** | _object_ | Additional inference parameters for Automatic Speech Recognition |
 | **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return_timestamps** | _boolean_ | Whether to output corresponding timestamps with the generated text |
-| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generate** | _object_ | Ad-hoc parametrization of the text generation process |
+| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generation_parameters** | _object_ | Ad-hoc parametrization of the text generation process |
 | **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;temperature** | _number_ | The value used to modulate the next token probabilities. |
 | **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | The number of highest probability vocabulary tokens to keep for top-k-filtering. |
 | **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_p** | _number_ | If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. |
 
@@ -14,20 +14,23 @@ For more details, check out:
 
 ## Chat Completion
 
-Generate a response given a list of messages.
-This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context.
-
-
+Generate a response given a list of messages in a conversational context, supporting both conversational Language Models (LLMs) and conversational Vision-Language Models (VLMs).
+This is a subtask of [`text-generation`](https://huggingface.co/docs/api-inference/tasks/text-generation) and [`image-text-to-text`](https://huggingface.co/docs/api-inference/tasks/image-text-to-text).
 
 ### Recommended models
 
+#### Conversational Large Language Models (LLMs)
+
 - [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
 - [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
 - [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
 - [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model.
 - [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model.
 
+#### Conversational Vision-Language Models (VLMs)
 
+- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct): Powerful vision language model with great visual understanding and reasoning capabilities.
+- [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct): Strong image-text-to-text model.
 
 ### Using the API
 
@@ -37,6 +40,8 @@ The API supports:
 * Using grammars, constraints, and tools.
 * Streaming the output
 
+#### Code snippet example for conversational LLMs
+
 
 <inferencesnippet>
 
@@ -59,18 +64,15 @@ curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/c
 ```py
 from huggingface_hub import InferenceClient
 
-client = InferenceClient(
-    "google/gemma-2-2b-it",
-    token="hf_***",
-)
+client = InferenceClient(api_key="hf_***")
 
 for message in client.chat_completion(
+	model="google/gemma-2-2b-it",
 	messages=[{"role": "user", "content": "What is the capital of France?"}],
 	max_tokens=500,
 	stream=True,
 ):
     print(message.choices[0].delta.content, end="")
-
 ```
 
 To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
@@ -89,7 +91,93 @@ for await (const chunk of inference.chatCompletionStream({
 })) {
 	process.stdout.write(chunk.choices[0]?.delta?.content || "");
 }
+```
+
+To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion).
+</js>
+
+</inferencesnippet>
+
+
+
+#### Code snippet example for conversational VLMs
 
+
+<inferencesnippet>
+
+<curl>
+```bash
+curl 'https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-11B-Vision-Instruct/v1/chat/completions' \
+-H "Authorization: Bearer hf_***" \
+-H 'Content-Type: application/json' \
+-d '{
+	"model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
+	"messages": [
+		{
+			"role": "user",
+			"content": [
+				{"type": "image_url", "image_url": {"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}},
+				{"type": "text", "text": "Describe this image in one sentence."}
+			]
+		}
+	],
+	"max_tokens": 500,
+	"stream": false
+}'
+
+```
+</curl>
+
+<python>
+```py
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(api_key="hf_***")
+
+image_url = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
+
+for message in client.chat_completion(
+	model="meta-llama/Llama-3.2-11B-Vision-Instruct",
+	messages=[
+		{
+			"role": "user",
+			"content": [
+				{"type": "image_url", "image_url": {"url": image_url}},
+				{"type": "text", "text": "Describe this image in one sentence."},
+			],
+		}
+	],
+	max_tokens=500,
+	stream=True,
+):
+	print(message.choices[0].delta.content, end="")
+```
+
+To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
+</python>
+
+<js>
+```js
+import { HfInference } from "@huggingface/inference";
+
+const inference = new HfInference("hf_***");
+const imageUrl = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg";
+
+for await (const chunk of inference.chatCompletionStream({
+	model: "meta-llama/Llama-3.2-11B-Vision-Instruct",
+	messages: [
+		{
+			"role": "user",
+			"content": [
+				{"type": "image_url", "image_url": {"url": imageUrl}},
+				{"type": "text", "text": "Describe this image in one sentence."},
+			],
+		}
+	],
+	max_tokens: 500,
+})) {
+	process.stdout.write(chunk.choices[0]?.delta?.content || "");
+}
 ```
 
 To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion).
 
@@ -31,7 +31,7 @@ For more details about the `feature-extraction` task, check out its [dedicated p
 
 - [thenlper/gte-large](https://huggingface.co/thenlper/gte-large): A powerful feature extraction model for natural language processing tasks.
 
-This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=feature-extraction&sort=trending).
+Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=feature-extraction&sort=trending).
 
 ### Using the API
 
@@ -45,7 +45,6 @@ curl https://api-inference.huggingface.co/models/thenlper/gte-large \
 	-d '{"inputs": "Today is a sunny day and I will get some ice cream."}' \
 	-H 'Content-Type: application/json' \
 	-H "Authorization: Bearer hf_***"
-
 ```
 </curl>
 
 
@@ -27,7 +27,7 @@ For more details about the `fill-mask` task, check out its [dedicated page](http
 - [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased): The famous BERT model.
 - [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base): A multilingual model trained on 100 languages.
 
-This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending).
+Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending).
 
 ### Using the API
 
@@ -41,7 +41,6 @@ curl https://api-inference.huggingface.co/models/google-bert/bert-base-uncased \
 	-d '{"inputs": "The answer to the universe is [MASK]."}' \
 	-H 'Content-Type: application/json' \
 	-H "Authorization: Bearer hf_***"
-
 ```
 </curl>
 
 
@@ -25,8 +25,9 @@ For more details about the `image-classification` task, check out its [dedicated
 ### Recommended models
 
 - [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224): A strong image classification model.
+- [facebook/deit-base-distilled-patch16-224](https://huggingface.co/facebook/deit-base-distilled-patch16-224): A robust image classification model.
 
-This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-classification&sort=trending).
+Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-classification&sort=trending).
 
 ### Using the API
 
@@ -39,7 +40,6 @@ curl https://api-inference.huggingface.co/models/google/vit-base-patch16-224 \
 	-X POST \
 	--data-binary '@cats.jpg' \
 	-H "Authorization: Bearer hf_***"
-
 ```
 </curl>
 
 
@@ -26,7 +26,7 @@ For more details about the `image-segmentation` task, check out its [dedicated p
 
 - [nvidia/segformer-b0-finetuned-ade-512-512](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512): Semantic segmentation model trained on ADE20k benchmark dataset with 512x512 resolution.
 
-This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-segmentation&sort=trending).
+Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-segmentation&sort=trending).
 
 ### Using the API
 
@@ -39,7 +39,6 @@ curl https://api-inference.huggingface.co/models/nvidia/segformer-b0-finetuned-a
 	-X POST \
 	--data-binary '@cats.jpg' \
 	-H "Authorization: Bearer hf_***"
-
 ```
 </curl>