Skip to content

Commit 8847f6f

Browse files
committed
feat: audio models
1 parent 181afe3 commit 8847f6f

File tree

16 files changed

+855
-29
lines changed

16 files changed

+855
-29
lines changed
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
title: How to use image and audio in chat completions with Azure AI model inference
3+
titleSuffix: Azure AI Foundry
4+
description: Learn how to process audio and images with chat completions models with Azure AI model inference
5+
manager: scottpolly
6+
author: msakande
7+
reviewer: santiagxf
8+
ms.service: azure-ai-model-inference
9+
ms.topic: how-to
10+
ms.date: 1/21/2025
11+
ms.author: mopeakande
12+
ms.reviewer: fasantia
13+
ms.custom: generated
14+
zone_pivot_groups: azure-ai-inference-samples
15+
---
16+
17+
# How to use image and audio in chat completions with Azure AI model inference
18+
19+
20+
::: zone pivot="programming-language-python"
21+
22+
[!INCLUDE [python](../includes/use-chat-multi-modal/python.md)]
23+
::: zone-end
24+
25+
26+
::: zone pivot="programming-language-javascript"
27+
28+
[!INCLUDE [javascript](../includes/use-chat-multi-modal/javascript.md)]
29+
::: zone-end
30+
31+
32+
::: zone pivot="programming-language-java"
33+
34+
[!INCLUDE [java](../includes/use-chat-multi-modal/java.md)]
35+
::: zone-end
36+
37+
38+
::: zone pivot="programming-language-csharp"
39+
40+
[!INCLUDE [csharp](../includes/use-chat-multi-modal/csharp.md)]
41+
::: zone-end
42+
43+
44+
::: zone pivot="programming-language-rest"
45+
46+
[!INCLUDE [rest](../includes/use-chat-multi-modal/rest.md)]
47+
::: zone-end
48+
49+
## Related content
50+
51+
* [Use embeddings models](use-embeddings.md)
52+
* [Use image embeddings models](use-image-embeddings.md)
53+
* [Use reasoning models](use-chat-reasoning.md)
54+
* [Azure AI Model Inference API](.././reference/reference-model-inference-api.md)
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
manager: nitinme
3+
ms.service: azure-ai-model-inference
4+
ms.topic: include
5+
ms.date: 1/21/2025
6+
ms.author: fasantia
7+
author: santiagxf
8+
---
9+
10+
* Install the [Azure AI inference package](https://aka.ms/azsdk/azure-ai-inference/python/reference) with the following command:
11+
12+
```bash
13+
dotnet add package Azure.AI.Inference --prerelease
14+
```
15+
16+
* If you are using Entra ID, you also need the following package:
17+
18+
```bash
19+
dotnet add package Azure.Identity
20+
```
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
2---
2+
manager: nitinme
3+
ms.service: azure-ai-model-inference
4+
ms.topic: include
5+
ms.date: 1/21/2025
6+
ms.author: fasantia
7+
author: santiagxf
8+
---
9+
10+
* Add the [Azure AI inference package](https://aka.ms/azsdk/azure-ai-inference/java/reference) to your project:
11+
12+
```xml
13+
<dependency>
14+
<groupId>com.azure</groupId>
15+
<artifactId>azure-ai-inference</artifactId>
16+
<version>1.0.0-beta.1</version>
17+
</dependency>
18+
```
19+
20+
* If you are using Entra ID, you also need the following package:
21+
22+
```xml
23+
<dependency>
24+
<groupId>com.azure</groupId>
25+
<artifactId>azure-identity</artifactId>
26+
<version>1.13.3</version>
27+
</dependency>
28+
```
29+
30+
* Import the following namespace:
31+
32+
```java
33+
package com.azure.ai.inference.usage;
34+
35+
import com.azure.ai.inference.EmbeddingsClient;
36+
import com.azure.ai.inference.EmbeddingsClientBuilder;
37+
import com.azure.ai.inference.models.EmbeddingsResult;
38+
import com.azure.ai.inference.models.EmbeddingItem;
39+
import com.azure.core.credential.AzureKeyCredential;
40+
import com.azure.core.util.Configuration;
41+
42+
import java.util.ArrayList;
43+
import java.util.List;
44+
```
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
manager: nitinme
3+
ms.service: azure-ai-model-inference
4+
ms.topic: include
5+
ms.date: 1/21/2025
6+
ms.author: fasantia
7+
author: santiagxf
8+
---
9+
10+
* Install the [Azure Inference library for JavaScript](https://aka.ms/azsdk/azure-ai-inference/javascript/reference) with the following command:
11+
12+
```bash
13+
npm install @azure-rest/ai-inference
14+
```
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
manager: nitinme
3+
ms.service: azure-ai-model-inference
4+
ms.topic: include
5+
ms.date: 1/21/2025
6+
ms.author: fasantia
7+
author: santiagxf
8+
---
9+
10+
* Install the [Azure AI inference package for Python](https://aka.ms/azsdk/azure-ai-inference/python/reference) with the following command:
11+
12+
```bash
13+
pip install -U azure-ai-inference
14+
```

articles/ai-foundry/model-inference/includes/use-chat-completions/python.md

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,9 @@ To use chat completion models in your application, you need:
2424

2525
[!INCLUDE [how-to-prerequisites](../how-to-prerequisites.md)]
2626

27-
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
28-
29-
* Install the [Azure AI inference package for Python](https://aka.ms/azsdk/azure-ai-inference/python/reference) with the following command:
30-
31-
```bash
32-
pip install -U azure-ai-inference
33-
```
27+
[!INCLUDE [how-to-prerequisites-python](../how-to-prerequisites-python.md)]
3428

29+
3530
## Use chat completions
3631

3732
First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
---
2+
title: How to use image and audio in chat completions with Azure AI model inference
3+
titleSuffix: Azure AI Foundry
4+
description: Learn how to process audio and images with chat completions models with Azure AI model inference
5+
manager: scottpolly
6+
author: mopeakande
7+
reviewer: santiagxf
8+
ms.service: azure-ai-model-inference
9+
ms.topic: how-to
10+
ms.date: 1/21/2025
11+
ms.author: mopeakande
12+
ms.reviewer: fasantia
13+
ms.custom: references_regions, tool_generated
14+
zone_pivot_groups: azure-ai-inference-samples
15+
---
16+
17+
[!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
18+
19+
This article explains how to use chat completions API with models deployed to Azure AI model inference in Azure AI services.
20+
21+
## Prerequisites
22+
23+
To use chat completion models in your application, you need:
24+
25+
[!INCLUDE [how-to-prerequisites](../how-to-prerequisites.md)]
26+
27+
[!INCLUDE [how-to-prerequisites-csharp](../how-to-prerequisites-csharp.md)]
28+
29+
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
30+
31+
32+
## Use chat completions
33+
34+
First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
35+
36+
37+
```csharp
38+
ChatCompletionsClient client = new ChatCompletionsClient(
39+
new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
40+
new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")),
41+
);
42+
```
43+
44+
If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
45+
46+
47+
```csharp
48+
TokenCredential credential = new DefaultAzureCredential(includeInteractiveCredentials: true);
49+
AzureAIInferenceClientOptions clientOptions = new AzureAIInferenceClientOptions();
50+
BearerTokenAuthenticationPolicy tokenPolicy = new BearerTokenAuthenticationPolicy(credential, new string[] { "https://cognitiveservices.azure.com/.default" });
51+
52+
clientOptions.AddPolicy(tokenPolicy, HttpPipelinePosition.PerRetry);
53+
54+
client = new ChatCompletionsClient(
55+
new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
56+
credential,
57+
clientOptions,
58+
);
59+
```
60+
61+
## Use chat completions with images
62+
63+
Some models can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Some models for vision in a chat fashion:
64+
65+
> [!IMPORTANT]
66+
> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
67+
68+
To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
69+
70+
71+
```csharp
72+
string imageUrl = "https://news.microsoft.com/source/wp-content/uploads/2024/04/The-Phi-3-small-language-models-with-big-potential-1-1900x1069.jpg";
73+
string imageFormat = "jpeg";
74+
HttpClient httpClient = new HttpClient();
75+
httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0");
76+
byte[] imageBytes = httpClient.GetByteArrayAsync(imageUrl).Result;
77+
string imageBase64 = Convert.ToBase64String(imageBytes);
78+
string dataUrl = $"data:image/{imageFormat};base64,{imageBase64}";
79+
```
80+
81+
Visualize the image:
82+
83+
:::image type="content" source="../../../../ai-studio/media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../../../../ai-studio/media/how-to/sdks/small-language-models-chart-example.jpg":::
84+
85+
Now, create a chat completion request with the image:
86+
87+
88+
```csharp
89+
ChatCompletionsOptions requestOptions = new ChatCompletionsOptions()
90+
{
91+
Messages = {
92+
new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
93+
new ChatRequestUserMessage([
94+
new ChatMessageTextContentItem("Which conclusion can be extracted from the following chart?"),
95+
new ChatMessageImageContentItem(new Uri(dataUrl))
96+
]),
97+
},
98+
MaxTokens=2048,
99+
Model = "Phi-4-multimodal-instruct",
100+
};
101+
102+
var response = client.Complete(requestOptions);
103+
Console.WriteLine(response.Value.Content);
104+
```
105+
106+
The response is as follows, where you can see the model's usage statistics:
107+
108+
109+
```csharp
110+
Console.WriteLine($"{response.Value.Role}: {response.Value.Content}");
111+
Console.WriteLine($"Model: {response.Value.Model}");
112+
Console.WriteLine("Usage:");
113+
Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}");
114+
Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}");
115+
Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}");
116+
```
117+
118+
```console
119+
ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance.
120+
Model: Phi-4-multimodal-instruct
121+
Usage:
122+
Prompt tokens: 2380
123+
Completion tokens: 126
124+
Total tokens: 2506
125+
```
126+
127+
## Use chat completions with audio
128+
129+
Some models can reason across text and audio inputs. The following example shows how you can send audio context to a chat completions models that also supports audio. Use `InputAudio` to load the content of the audio file into the payload. The content is encoded in `base64` data and sent over the payload.
130+
131+
```csharp
132+
var requestOptions = new ChatCompletionsOptions()
133+
{
134+
Messages =
135+
{
136+
new ChatRequestSystemMessage("You are an AI assistant for translating and transcribing audio clips."),
137+
new ChatRequestUserMessage(
138+
new ChatMessageTextContentItem("Please translate this audio snippet to spanish."),
139+
new ChatMessageAudioContentItem("hello_how_are_you.mp3", AudioContentFormat.Mp3),
140+
},
141+
};
142+
143+
Response<ChatCompletions> response = client.Complete(requestOptions);
144+
```
145+
146+
The response is as follows, where you can see the model's usage statistics:
147+
148+
```csharp
149+
Console.WriteLine($"{response.Value.Role}: {response.Value.Content}");
150+
Console.WriteLine($"Model: {response.Value.Model}");
151+
Console.WriteLine("Usage:");
152+
Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}");
153+
Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}");
154+
Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}");
155+
```
156+
157+
```console
158+
ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance.
159+
Model: Phi-4-multimodal-instruct
160+
Usage:
161+
Prompt tokens: 2380
162+
Completion tokens: 126
163+
Total tokens: 2506
164+
```
165+
166+
If you want to avoid sending the information over the requests, you can place the content in an accessible cloud location and pass the URL as an input to the model. The Python SDK doesn't provide a direct way to do it, but you can indicate the payload as follows:
167+
168+
```csharp
169+
var requestOptions = new ChatCompletionsOptions()
170+
{
171+
Messages =
172+
{
173+
new ChatRequestSystemMessage("You are an AI assistant for translating and transcribing audio clips."),
174+
new ChatRequestUserMessage(
175+
new ChatMessageTextContentItem("Please translate this audio snippet to spanish."),
176+
new ChatMessageAudioContentItem(new Uri("https://.../hello_how_are_you.mp3"))),
177+
},
178+
};
179+
180+
Response<ChatCompletions> response = client.Complete(requestOptions);
181+
```
182+
183+
The response is as follows, where you can see the model's usage statistics:
184+
185+
```csharp
186+
Console.WriteLine($"{response.Value.Role}: {response.Value.Content}");
187+
Console.WriteLine($"Model: {response.Value.Model}");
188+
Console.WriteLine("Usage:");
189+
Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}");
190+
Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}");
191+
Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}");
192+
```
193+
194+
```console
195+
ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance.
196+
Model: Phi-4-multimodal-instruct
197+
Usage:
198+
Prompt tokens: 2380
199+
Completion tokens: 126
200+
Total tokens: 2506
201+
```
202+
203+
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
title: How to use image and audio in chat completions with Azure AI model inference
3+
titleSuffix: Azure AI Foundry
4+
description: Learn how to process audio and images with chat completions models with Azure AI model inference
5+
manager: scottpolly
6+
author: mopeakande
7+
reviewer: santiagxf
8+
ms.service: azure-ai-model-inference
9+
ms.topic: how-to
10+
ms.date: 1/21/2025
11+
ms.author: mopeakande
12+
ms.reviewer: fasantia
13+
ms.custom: references_regions, tool_generated
14+
zone_pivot_groups: azure-ai-inference-samples
15+
---
16+
17+
> [!NOTE]
18+
> Using audio inputs is only supported using Python, C#, or REST requests.

0 commit comments

Comments
 (0)