Skip to content

Commit 340ceed

Browse files
committed
update multimodal
1 parent 3332ac7 commit 340ceed

File tree

4 files changed

+22
-16
lines changed

4 files changed

+22
-16
lines changed

articles/ai-foundry/model-inference/includes/use-chat-multi-modal/csharp.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ zone_pivot_groups: azure-ai-inference-samples
1616

1717
[!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
1818

19-
This article explains how to use chat completions API with multimodel models deployed to Azure AI model inference in Azure AI services. These multimodal models can accept combinations of text, images, and audio input.
19+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. In addition to text input, multimodal models can accept other input types, such as images and audio input.
2020

2121
## Prerequisites
2222

@@ -62,6 +62,9 @@ client = new ChatCompletionsClient(
6262

6363
Some models can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Some models for vision in a chat fashion:
6464

65+
> [!IMPORTANT]
66+
> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
67+
6568
To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
6669

6770

@@ -160,9 +163,9 @@ Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}
160163
ASSISTANT: Hola. ¿Cómo estás?
161164
Model: speech
162165
Usage:
163-
Prompt tokens: 77
164-
Completion tokens: 7
165-
Total tokens: 84
166+
Prompt tokens: 77
167+
Completion tokens: 7
168+
Total tokens: 84
166169
```
167170

168171
The model can read the content from an **accessible cloud location** by passing the URL as an input. The Python SDK doesn't provide a direct way to do it, but you can indicate the payload as follows:
@@ -197,9 +200,9 @@ Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}
197200
ASSISTANT: Hola. ¿Cómo estás?
198201
Model: speech
199202
Usage:
200-
Prompt tokens: 77
201-
Completion tokens: 7
202-
Total tokens: 84
203+
Prompt tokens: 77
204+
Completion tokens: 7
205+
Total tokens: 84
203206
```
204207

205208
Audio is broken into tokens and submitted to the model for processing. Some models may operate directly over audio tokens while other may use internal modules to perform speech-to-text, resulting in different strategies to compute tokens. Read the model card for details about how each model operates.

articles/ai-foundry/model-inference/includes/use-chat-multi-modal/javascript.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ zone_pivot_groups: azure-ai-inference-samples
1616

1717
[!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
1818

19-
This article explains how to use chat completions API with multimodel models deployed to Azure AI model inference in Azure AI services. These multimodal models can accept combinations of text, images, and audio input.
19+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. In addition to text input, multimodal models can accept other input types, such as images and audio input.
2020

2121
## Prerequisites
2222

@@ -57,6 +57,9 @@ const client = new ModelClient(
5757

5858
Some models can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of some models for vision in a chat fashion.
5959

60+
> [!IMPORTANT]
61+
> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
62+
6063
To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
6164

6265
```javascript
@@ -209,9 +212,9 @@ console.log("\tCompletion tokens:", response.body.usage.completion_tokens);
209212
ASSISTANT: Hola. ¿Cómo estás?
210213
Model: speech
211214
Usage:
212-
Prompt tokens: 77
213-
Completion tokens: 7
214-
Total tokens: 84
215+
Prompt tokens: 77
216+
Completion tokens: 7
217+
Total tokens: 84
215218
```
216219

217220
The model can read the content from an **accessible cloud location** by passing the URL as an input. The Python SDK doesn't provide a direct way to do it, but you can indicate the payload as follows:

articles/ai-foundry/model-inference/includes/use-chat-multi-modal/python.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ zone_pivot_groups: azure-ai-inference-samples
1616

1717
[!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
1818

19-
This article explains how to use chat completions API with multimodel models deployed to Azure AI model inference in Azure AI services. These multimodal models can accept combinations of text, images, and audio input.
19+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. In addition to text input, multimodal models can accept other input types, such as images and audio input.
2020

2121
## Prerequisites
2222

@@ -179,9 +179,9 @@ print("\tTotal tokens:", response.usage.total_tokens)
179179
ASSISTANT: Hola. ¿Cómo estás?
180180
Model: speech
181181
Usage:
182-
Prompt tokens: 77
183-
Completion tokens: 7
184-
Total tokens: 84
182+
Prompt tokens: 77
183+
Completion tokens: 7
184+
Total tokens: 84
185185
```
186186

187187
The model can read the content from an **accessible cloud location** by passing the URL as an input. The Python SDK doesn't provide a direct way to do it, but you can indicate the payload as follows:

articles/ai-foundry/model-inference/includes/use-chat-multi-modal/rest.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ zone_pivot_groups: azure-ai-inference-samples
1616

1717
[!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
1818

19-
This article explains how to use chat completions API with multimodel models deployed to Azure AI model inference in Azure AI services. These multimodal models can accept combinations of text, images, and audio input.
19+
This article explains how to use chat completions API with _multimodal_ models deployed to Azure AI model inference in Azure AI services. In addition to text input, multimodal models can accept other input types, such as images and audio input.
2020

2121
## Prerequisites
2222

0 commit comments

Comments
 (0)