Skip to content

Commit b32e527

Browse files
authored
Merge pull request #3571 from MicrosoftDocs/main
3/17/2025 PM Publish
2 parents fcdfea2 + 97f1fa0 commit b32e527

File tree

24 files changed

+285
-289
lines changed

24 files changed

+285
-289
lines changed

.openpublishing.redirection.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,11 @@
259259
"source_path_from_root": "/articles/open-datasets/dataset-genomics-data-lake.md",
260260
"redirect_url": "/azure/open-datasets/dataset-catalog",
261261
"redirect_document_id": false
262+
},
263+
{
264+
"source_path_from_root": "/articles/ai-services/openai/concepts/provisioned-reservation-update.md",
265+
"redirect_url": "/azure/ai-services/openai/concepts/provisioned-migration",
266+
"redirect_document_id": true
262267
}
263268
]
264269
}

articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,12 +98,12 @@ Add the package to your project:
9898
<dependency>
9999
<groupId>com.azure</groupId>
100100
<artifactId>azure-ai-inference</artifactId>
101-
<version>1.0.0-beta.1</version>
101+
<version>1.0.0-beta.4</version>
102102
</dependency>
103103
<dependency>
104104
<groupId>com.azure</groupId>
105105
<artifactId>azure-identity</artifactId>
106-
<version>1.13.3</version>
106+
<version>1.15.3</version>
107107
</dependency>
108108
```
109109

articles/ai-foundry/model-inference/includes/code-create-chat-client.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ import os
2222
from azure.ai.inference import ChatCompletionsClient
2323
from azure.core.credentials import AzureKeyCredential
2424

25-
model = ChatCompletionsClient(
25+
client = ChatCompletionsClient(
2626
endpoint="https://<resource>.services.ai.azure.com/models",
27-
credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
27+
credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
2828
)
2929
```
3030

@@ -47,7 +47,7 @@ import { AzureKeyCredential } from "@azure/core-auth";
4747

4848
const client = new ModelClient(
4949
"https://<resource>.services.ai.azure.com/models",
50-
new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
50+
new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL)
5151
);
5252
```
5353

@@ -97,7 +97,7 @@ Then, you can use the package to consume the model. The following example shows
9797
```java
9898
ChatCompletionsClient client = new ChatCompletionsClientBuilder()
9999
.credential(new AzureKeyCredential("{key}"))
100-
.endpoint("{endpoint}")
100+
.endpoint("https://<resource>.services.ai.azure.com/models")
101101
.buildClient();
102102
```
103103

articles/ai-foundry/model-inference/includes/how-to-prerequisites-java.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
2---
1+
---
22
manager: nitinme
33
ms.service: azure-ai-model-inference
44
ms.topic: include
@@ -13,7 +13,7 @@ author: santiagxf
1313
<dependency>
1414
<groupId>com.azure</groupId>
1515
<artifactId>azure-ai-inference</artifactId>
16-
<version>1.0.0-beta.1</version>
16+
<version>1.0.0-beta.4</version>
1717
</dependency>
1818
```
1919

@@ -23,7 +23,7 @@ author: santiagxf
2323
<dependency>
2424
<groupId>com.azure</groupId>
2525
<artifactId>azure-identity</artifactId>
26-
<version>1.13.3</version>
26+
<version>1.15.3</version>
2727
</dependency>
2828
```
2929

@@ -34,8 +34,11 @@ author: santiagxf
3434

3535
import com.azure.ai.inference.EmbeddingsClient;
3636
import com.azure.ai.inference.EmbeddingsClientBuilder;
37+
import com.azure.ai.inference.ChatCompletionsClient;
38+
import com.azure.ai.inference.ChatCompletionsClientBuilder;
3739
import com.azure.ai.inference.models.EmbeddingsResult;
3840
import com.azure.ai.inference.models.EmbeddingItem;
41+
import com.azure.ai.inference.models.ChatCompletions;
3942
import com.azure.core.credential.AzureKeyCredential;
4043
import com.azure.core.util.Configuration;
4144

articles/ai-foundry/model-inference/includes/use-chat-completions/csharp.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,11 @@ To use chat completion models in your application, you need:
2424

2525
[!INCLUDE [how-to-prerequisites](../how-to-prerequisites.md)]
2626

27-
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
28-
29-
* Install the [Azure AI inference package](https://aka.ms/azsdk/azure-ai-inference/python/reference) with the following command:
27+
[!INCLUDE [how-to-prerequisites-csharp](../how-to-prerequisites-csharp.md)]
3028

31-
```bash
32-
dotnet add package Azure.AI.Inference --prerelease
33-
```
34-
35-
* If you are using Entra ID, you also need the following package:
29+
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
3630

37-
```bash
38-
dotnet add package Azure.Identity
39-
```
31+
* This example uses `mistral-large-2407`.
4032

4133
## Use chat completions
4234

articles/ai-foundry/model-inference/includes/use-chat-completions/java.md

Lines changed: 58 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -24,58 +24,61 @@ To use chat completion models in your application, you need:
2424

2525
[!INCLUDE [how-to-prerequisites](../how-to-prerequisites.md)]
2626

27+
[!INCLUDE [how-to-prerequisites-java](../how-to-prerequisites-java.md)]
28+
2729
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
2830

29-
* Add the [Azure AI inference package](https://aka.ms/azsdk/azure-ai-inference/java/reference) to your project:
30-
31-
```xml
32-
<dependency>
33-
<groupId>com.azure</groupId>
34-
<artifactId>azure-ai-inference</artifactId>
35-
<version>1.0.0-beta.1</version>
36-
</dependency>
37-
```
38-
39-
* If you are using Entra ID, you also need the following package:
40-
41-
```xml
42-
<dependency>
43-
<groupId>com.azure</groupId>
44-
<artifactId>azure-identity</artifactId>
45-
<version>1.13.3</version>
46-
</dependency>
47-
```
48-
49-
* Import the following namespace:
50-
51-
```java
52-
package com.azure.ai.inference.usage;
53-
54-
import com.azure.ai.inference.EmbeddingsClient;
55-
import com.azure.ai.inference.EmbeddingsClientBuilder;
56-
import com.azure.ai.inference.models.EmbeddingsResult;
57-
import com.azure.ai.inference.models.EmbeddingItem;
58-
import com.azure.core.credential.AzureKeyCredential;
59-
import com.azure.core.util.Configuration;
60-
61-
import java.util.ArrayList;
62-
import java.util.List;
63-
```
31+
* This example uses `mistral-large-2407`.
6432

6533
## Use chat completions
6634

6735
First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
6836

37+
```java
38+
ChatCompletionsClient client = new ChatCompletionsClientBuilder()
39+
.credential(new AzureKeyCredential("{key}"))
40+
.endpoint("https://<resource>.services.ai.azure.com/models")
41+
.buildClient();
42+
```
43+
6944
If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
7045

46+
```java
47+
TokenCredential defaultCredential = new DefaultAzureCredentialBuilder().build();
48+
ChatCompletionsClient client = new ChatCompletionsClientBuilder()
49+
.credential(defaultCredential)
50+
.endpoint("https://<resource>.services.ai.azure.com/models")
51+
.buildClient();
52+
```
53+
54+
7155
### Create a chat completion request
7256

7357
The following example shows how you can create a basic chat completions request to the model.
58+
59+
```java
60+
List<ChatRequestMessage> chatMessages = new ArrayList<>();
61+
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant."));
62+
chatMessages.add(new ChatRequestUserMessage("How many languages are in the world?"));
63+
64+
ChatCompletions response = client.complete(new ChatCompletionsOptions(chatMessages));
65+
```
66+
7467
> [!NOTE]
7568
> Some models don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
7669
7770
The response is as follows, where you can see the model's usage statistics:
7871

72+
```java
73+
System.out.printf("Model ID=%s is created at %s.%n", chatCompletions.getId(), chatCompletions.getCreated());
74+
for (ChatChoice choice : chatCompletions.getChoices()) {
75+
ChatResponseMessage message = choice.getMessage();
76+
System.out.printf("Index: %d, Chat Role: %s.%n", choice.getIndex(), message.getRole());
77+
System.out.println("Message:");
78+
System.out.println(message.getContent());
79+
}
80+
```
81+
7982
```console
8083
Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.
8184
Model: mistral-large-2407
@@ -93,7 +96,26 @@ By default, the completions API returns the entire generated content in a single
9396

9497
You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field.
9598

96-
You can visualize how streaming generates content:
99+
```java
100+
List<ChatRequestMessage> chatMessages = new ArrayList<>();
101+
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant."));
102+
chatMessages.add(new ChatRequestUserMessage("How many languages are in the world?"));
103+
104+
client.completeStream(new ChatCompletionsOptions(chatMessages))
105+
.forEach(chatCompletions -> {
106+
if (CoreUtils.isNullOrEmpty(chatCompletions.getChoices())) {
107+
return;
108+
}
109+
StreamingChatResponseMessageUpdate delta = chatCompletions.getChoice().getDelta();
110+
if (delta.getRole() != null) {
111+
System.out.println("Role = " + delta.getRole());
112+
}
113+
if (delta.getContent() != null) {
114+
String content = delta.getContent();
115+
System.out.print(content);
116+
}
117+
});
118+
```
97119

98120
#### Explore more parameters supported by the inference client
99121

@@ -141,29 +163,3 @@ The following example shows how to handle events when the model detects harmful
141163

142164
> [!TIP]
143165
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
144-
145-
## Use chat completions with images
146-
147-
Some models can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Some models for vision in a chat fashion:
148-
149-
> [!IMPORTANT]
150-
> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
151-
152-
To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
153-
154-
Visualize the image:
155-
156-
:::image type="content" source="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg":::
157-
158-
Now, create a chat completion request with the image:
159-
160-
The response is as follows, where you can see the model's usage statistics:
161-
162-
```console
163-
ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance.
164-
Model: mistral-large-2407
165-
Usage:
166-
Prompt tokens: 2380
167-
Completion tokens: 126
168-
Total tokens: 2506
169-
```

articles/ai-foundry/model-inference/includes/use-chat-completions/python.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,15 @@ To use chat completion models in your application, you need:
2525
[!INCLUDE [how-to-prerequisites](../how-to-prerequisites.md)]
2626

2727
[!INCLUDE [how-to-prerequisites-python](../how-to-prerequisites-python.md)]
28-
28+
29+
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
30+
31+
* This example uses `mistral-large-2407`.
2932

3033
## Use chat completions
3134

3235
First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
3336

34-
3537
```python
3638
import os
3739
from azure.ai.inference import ChatCompletionsClient

articles/ai-foundry/model-inference/includes/use-chat-completions/rest.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,24 +26,28 @@ To use chat completion models in your application, you need:
2626

2727
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
2828

29+
* This example uses `mistral-large-2407`.
30+
2931
## Use chat completions
3032

31-
To use chat completions API, use the route `/chat/completions` appended to the base URL along with your credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
33+
To use chat completions API, use the route `/chat/completions` appended to the base URL along with your credential indicated in `api-key`.
3234

3335
```http
3436
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
3537
Content-Type: application/json
3638
api-key: <key>
3739
```
3840

39-
If you have configured the resource with **Microsoft Entra ID** support, pass you token in the `Authorization` header:
41+
If you have configured the resource with **Microsoft Entra ID** support, pass you token in the `Authorization` header with the format `Bearer <token>`. Use scope `https://cognitiveservices.azure.com/.default`.
4042

4143
```http
4244
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
4345
Content-Type: application/json
4446
Authorization: Bearer <token>
4547
```
4648

49+
Using Microsoft Entra ID may require additional configuration in your resource to grant access. Learn how to [configure key-less authentication with Microsoft Entra ID](../../how-to/configure-entra-id.md).
50+
4751
### Create a chat completion request
4852

4953
The following example shows how you can create a basic chat completions request to the model.

articles/ai-foundry/model-inference/includes/use-chat-multi-modal/csharp.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ zone_pivot_groups: azure-ai-inference-samples
1616

1717
[!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
1818

19-
This article explains how to use chat completions API with models deployed to Azure AI model inference in Azure AI services.
19+
This article explains how to use chat completions API with models supporting images or audio deployed to Azure AI model inference in Azure AI services.
2020

2121
## Prerequisites
2222

@@ -28,6 +28,7 @@ To use chat completion models in your application, you need:
2828

2929
* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
3030

31+
* This example uses `phi-4-multimodal-instruct`.
3132

3233
## Use chat completions
3334

0 commit comments

Comments
 (0)