You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. In addition to text input, multimodal models can accept other input types, such as images or audio input.
19
+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. Apart from text input, multimodal models can accept other input types, such as images or audio input.
20
20
21
21
## Prerequisites
22
22
@@ -26,7 +26,7 @@ To use chat completion models in your application, you need:
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
29
+
* A chat completions model deployment. If you don't have one, read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
30
30
31
31
* This example uses `phi-4-multimodal-instruct`.
32
32
@@ -42,7 +42,7 @@ ChatCompletionsClient client = new ChatCompletionsClient(
42
42
);
43
43
```
44
44
45
-
If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
45
+
If you've configured the resource with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
46
46
47
47
48
48
```csharp
@@ -125,7 +125,7 @@ Usage:
125
125
Total tokens: 2506
126
126
```
127
127
128
-
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model may break down a given image on a different number of patches. Read the model card to learn the details.
128
+
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model might break down a given image on a different number of patches. Read the model card to learn the details.
129
129
130
130
> [!IMPORTANT]
131
131
> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
Copy file name to clipboardExpand all lines: articles/ai-foundry/model-inference/includes/use-chat-multi-modal/java.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,14 +7,15 @@ author: mopeakande
7
7
reviewer: santiagxf
8
8
ms.service: azure-ai-model-inference
9
9
ms.topic: how-to
10
-
ms.date: 1/21/2025
10
+
ms.date: 03/20/2025
11
11
ms.author: mopeakande
12
12
ms.reviewer: fasantia
13
13
ms.custom: references_regions, tool_generated
14
14
zone_pivot_groups: azure-ai-inference-samples
15
15
---
16
16
17
17
This article explains how to use chat completions API with models supporting images or audio deployed to Azure AI model inference in Azure AI services.
18
+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. Apart from text input, multimodal models can accept other input types, such as images or audio input.
18
19
19
20
## Prerequisites
20
21
@@ -39,7 +40,7 @@ ChatCompletionsClient client = new ChatCompletionsClientBuilder()
39
40
.buildClient();
40
41
```
41
42
42
-
If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
43
+
If you've configured the resource with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model may break down a given image on a different number of patches. Read the model card to learn the details.
97
+
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model might break down a given image on a different number of patches. Read the model card to learn the details.
97
98
98
99
> [!IMPORTANT]
99
100
> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. In addition to text input, multimodal models can accept other input types, such as images and audio input.
19
+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. Apart from text input, multimodal models can accept other input types, such as images or audio input.
20
20
21
21
## Prerequisites
22
22
@@ -26,9 +26,9 @@ To use chat completion models in your application, you need:
* A chat completions model deployment with support for **audio and images**. If you don't have one read[Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
29
+
* A chat completions model deployment with support for **audio and images**. If you don't have one, see[Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
30
30
31
-
* This tutorial uses `Phi-4-multimodal-instruct`.
31
+
* This article uses `Phi-4-multimodal-instruct`.
32
32
33
33
## Use chat completions
34
34
@@ -41,7 +41,7 @@ const client = new ModelClient(
41
41
);
42
42
```
43
43
44
-
If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
44
+
If you've configured the resource with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model may break down a given image on a different number of patches. Read the model card to learn the details.
133
+
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model might break down a given image on a different number of patches. Read the model card to learn the details.
134
134
135
135
> [!IMPORTANT]
136
136
> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
Audio is broken into tokens and submitted to the model for processing. Some models may operate directly over audio tokens while other may use internal modules to perform speech-to-text, resulting in different strategies to compute tokens. Read the model card for details about how each model operates.
247
+
Audio is broken into tokens and submitted to the model for processing. Some models might operate directly over audio tokens while other might use internal modules to perform speech-to-text, resulting in different strategies to compute tokens. Read the model card for details about how each model operates.
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. In addition to text input, multimodal models can accept other input types, such as images and audio input.
19
+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. Apart from text input, multimodal models can accept other input types, such as images or audio input.
20
20
21
21
## Prerequisites
22
22
@@ -26,9 +26,9 @@ To use chat completion models in your application, you need:
* A chat completions model deployment with support for **audio and images**. If you don't have one read[Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
29
+
* A chat completions model deployment with support for **audio and images**. If you don't have one, see[Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
30
30
31
-
* This tutorial uses `Phi-4-multimodal-instruct`.
31
+
* This article uses `Phi-4-multimodal-instruct`.
32
32
33
33
## Use chat completions
34
34
@@ -47,7 +47,7 @@ client = ChatCompletionsClient(
47
47
)
48
48
```
49
49
50
-
If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
50
+
If you've configured the resource with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
51
51
52
52
53
53
```python
@@ -133,7 +133,7 @@ Usage:
133
133
Total tokens: 2506
134
134
```
135
135
136
-
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model may break down a given image on a different number of patches. Read the model card to learn the details.
136
+
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model might break down a given image on a different number of patches. Read the model card to learn the details.
137
137
138
138
## Use chat completions with audio
139
139
@@ -214,4 +214,4 @@ response = client.complete(
214
214
)
215
215
```
216
216
217
-
Audio is broken into tokens and submitted to the model for processing. Some models may operate directly over audio tokens while other may use internal modules to perform speech-to-text, resulting in different strategies to compute tokens. Read the model card for details about how each model operates.
217
+
Audio is broken into tokens and submitted to the model for processing. Some models might operate directly over audio tokens while other might use internal modules to perform speech-to-text, resulting in different strategies to compute tokens. Read the model card for details about how each model operates.
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. In addition to text input, multimodal models can accept other input types, such as images and audio input.
19
+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. Apart from text input, multimodal models can accept other input types, such as images or audio input.
20
20
21
21
## Prerequisites
22
22
23
23
To use chat completion models in your application, you need:
* A chat completions model deployment. If you don't have one read[Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
27
+
* A chat completions model deployment. If you don't have one, see[Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
If you have configured the resource with **Microsoft Entra ID** support, pass you token in the `Authorization` header with the format `Bearer <token>`. Use scope `https://cognitiveservices.azure.com/.default`.
42
+
If you've configured the resource with **Microsoft Entra ID** support, pass your token in the `Authorization` header with the format `Bearer <token>`. Use scope `https://cognitiveservices.azure.com/.default`.
43
43
44
44
```http
45
45
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
46
46
Content-Type: application/json
47
47
Authorization: Bearer <token>
48
48
```
49
49
50
-
Using Microsoft Entra ID may require additional configuration in your resource to grant access. Learn how to [configure key-less authentication with Microsoft Entra ID](../../how-to/configure-entra-id.md).
50
+
Using Microsoft Entra ID might require extra configuration in your resource to grant access. Learn how to [configure key-less authentication with Microsoft Entra ID](../../how-to/configure-entra-id.md).
51
51
52
52
## Use chat completions with images
53
53
@@ -59,7 +59,7 @@ Some models can reason across text and images and generate text completions base
59
59
To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
60
60
61
61
> [!TIP]
62
-
> You will need to construct the data URL using a scripting or programming language. This tutorial uses [this sample image](../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg) in JPEG format. A data URL has a format as follows: `...`.
62
+
> You'll need to construct the data URL using a scripting or programming language. This article uses [this sample image](../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg) in JPEG format. A data URL has a format as follows: `...`.
63
63
64
64
Visualize the image:
65
65
@@ -123,7 +123,7 @@ The response is as follows, where you can see the model's usage statistics:
123
123
}
124
124
```
125
125
126
-
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model may break down a given image on a different number of patches. Read the model card to learn the details.
126
+
Images are broken into tokens and submitted to the model for processing. When referring to images, each of those tokens is typically referred as *patches*. Each model might break down a given image on a different number of patches. Read the model card to learn the details.
127
127
128
128
## Use chat completions with audio
129
129
@@ -244,4 +244,4 @@ The response is as follows, where you can see the model's usage statistics:
244
244
}
245
245
```
246
246
247
-
Audio is broken into tokens and submitted to the model for processing. Some models may operate directly over audio tokens while other may use internal modules to perform speech-to-text, resulting in different strategies to compute tokens. Read the model card for details about how each model operates.
247
+
Audio is broken into tokens and submitted to the model for processing. Some models might operate directly over audio tokens while others might use internal modules to perform speech-to-text, resulting in different strategies to compute tokens. Read the model card for details about how each model operates.
0 commit comments