You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* A chat completions model deployment with support for **audio and images**. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
30
+
31
+
* This tutorial uses `Phi-4-multimodal-instruct`.
32
+
33
+
## Use chat completions
34
+
35
+
First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
Some models can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of some models for vision in a chat fashion.
59
+
60
+
> [!IMPORTANT]
61
+
> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
62
+
63
+
To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
:::image type="content" source="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg":::
85
+
86
+
Now, create a chat completion request with the image:
87
+
88
+
89
+
```javascript
90
+
var messages = [
91
+
{ role:"system", content:"You are a helpful assistant that can generate responses based on images." },
92
+
{ role:"user", content:
93
+
[
94
+
{ type:"text", text:"Which conclusion can be extracted from the following chart?" },
95
+
{ type:"image_url", image:
96
+
{
97
+
url: data_url
98
+
}
99
+
}
100
+
]
101
+
}
102
+
];
103
+
104
+
var response =awaitclient.path("/chat/completions").post({
105
+
body: {
106
+
messages: messages,
107
+
model:"Phi-4-multimodal-instruct",
108
+
}
109
+
});
110
+
```
111
+
112
+
The response is as follows, where you can see the model's usage statistics:
ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance.
126
+
Model: Phi-4-multimodal-instruct
127
+
Usage:
128
+
Prompt tokens: 2380
129
+
Completion tokens: 126
130
+
Total tokens: 2506
131
+
```
132
+
133
+
## Use chat completions with audio
134
+
135
+
Some models can reason across text and audio inputs. The following example shows how you can send audio context to chat completions models that also supports audio.
136
+
137
+
In this example, we create a function `getAudioData` to load the content of the audio file encoded in `base64` data as the model expects it.
138
+
139
+
```javascript
140
+
/**
141
+
* Get the Base 64 data of an audio file.
142
+
* @param{string}audioFile - The path to the image file.
143
+
* @returns{string} Base64 data of the audio.
144
+
*/
145
+
functiongetAudioData(audioFile:string): string {
146
+
try {
147
+
constaudioBuffer=fs.readFileSync(audioFile);
148
+
returnaudioBuffer.toString("base64");
149
+
} catch (error) {
150
+
console.error(`Could not read '${audioFile}'.`);
151
+
console.error("Set the correct path to the audio file before running this sample.");
152
+
process.exit(1);
153
+
}
154
+
}
155
+
```
156
+
157
+
Let's now use this function to load the content of an audio file stored on disk. We send the content of the audio file in a user message. Notice that in the request we also indicate the format of the audio content:
158
+
159
+
```javascript
160
+
constaudioFilePath="hello_how_are_you.mp3"
161
+
constaudioFormat="mp3"
162
+
constaudioData=getAudioData(audioFilePath);
163
+
164
+
constsystemMessage= { role:"system", content:"You are an AI assistant for translating and transcribing audio clips." };
165
+
constaudioMessage= {
166
+
role:"user",
167
+
content: [
168
+
{ type:"text", text:"Translate this audio snippet to spanish."},
The model can read the content from an **accessible cloud location** by passing the URL as an input. The Python SDK doesn't provide a direct way to do it, but you can indicate the payload as follows:
214
+
215
+
```javascript
216
+
constsystemMessage= { role:"system", content:"You are a helpful assistant." };
0 commit comments