MicrosoftDocs
diff --git a/‎articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md‎
Lines changed: 2 additions & 2 deletions b/‎articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/code-create-chat-client.md‎
Lines changed: 4 additions & 4 deletions b/‎articles/ai-foundry/model-inference/includes/code-create-chat-client.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/how-to-prerequisites-java.md‎
Lines changed: 6 additions & 3 deletions b/‎articles/ai-foundry/model-inference/includes/how-to-prerequisites-java.md‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/use-chat-completions/csharp.md‎
Lines changed: 3 additions & 11 deletions b/‎articles/ai-foundry/model-inference/includes/use-chat-completions/csharp.md‎
Lines changed: 3 additions & 11 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/use-chat-completions/java.md‎
Lines changed: 58 additions & 62 deletions b/‎articles/ai-foundry/model-inference/includes/use-chat-completions/java.md‎
Lines changed: 58 additions & 62 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/use-chat-completions/python.md‎
Lines changed: 4 additions & 2 deletions b/‎articles/ai-foundry/model-inference/includes/use-chat-completions/python.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/use-chat-completions/rest.md‎
Lines changed: 6 additions & 2 deletions b/‎articles/ai-foundry/model-inference/includes/use-chat-completions/rest.md‎
Lines changed: 6 additions & 2 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/use-chat-multi-modal/csharp.md‎
Lines changed: 2 additions & 1 deletion b/‎articles/ai-foundry/model-inference/includes/use-chat-multi-modal/csharp.md‎
Lines changed: 2 additions & 1 deletion
@@ -98,12 +98,12 @@ Add the package to your project:
 <dependency>
     <groupId>com.azure</groupId>
     <artifactId>azure-ai-inference</artifactId>
-    <version>1.0.0-beta.1</version>
+    <version>1.0.0-beta.4</version>
 </dependency>
 <dependency>
     <groupId>com.azure</groupId>
     <artifactId>azure-identity</artifactId>
-    <version>1.13.3</version>
+    <version>1.15.3</version>
 </dependency>
 ```
 
 
@@ -22,9 +22,9 @@ import os
 from azure.ai.inference import ChatCompletionsClient
 from azure.core.credentials import AzureKeyCredential
 
-model = ChatCompletionsClient(
+client = ChatCompletionsClient(
     endpoint="https://<resource>.services.ai.azure.com/models",
-    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
+    credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
 )
 ```
 
@@ -47,7 +47,7 @@ import { AzureKeyCredential } from "@azure/core-auth";
 
 const client = new ModelClient(
     "https://<resource>.services.ai.azure.com/models", 
-    new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
+    new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL)
 );
 ```
 
@@ -97,7 +97,7 @@ Then, you can use the package to consume the model. The following example shows
 ```java
 ChatCompletionsClient client = new ChatCompletionsClientBuilder()
     .credential(new AzureKeyCredential("{key}"))
-    .endpoint("{endpoint}")
+    .endpoint("https://<resource>.services.ai.azure.com/models")
     .buildClient();
 ```
 
 
@@ -1,4 +1,4 @@
-2---
+---
 manager: nitinme
 ms.service: azure-ai-model-inference
 ms.topic: include
@@ -13,7 +13,7 @@ author: santiagxf
   <dependency>
       <groupId>com.azure</groupId>
       <artifactId>azure-ai-inference</artifactId>
-      <version>1.0.0-beta.1</version>
+      <version>1.0.0-beta.4</version>
   </dependency>
   ```
 
@@ -23,7 +23,7 @@ author: santiagxf
   <dependency>
       <groupId>com.azure</groupId>
       <artifactId>azure-identity</artifactId>
-      <version>1.13.3</version>
+      <version>1.15.3</version>
   </dependency>
   ```
 
@@ -34,8 +34,11 @@ author: santiagxf
 
   import com.azure.ai.inference.EmbeddingsClient;
   import com.azure.ai.inference.EmbeddingsClientBuilder;
+  import com.azure.ai.inference.ChatCompletionsClient;
+  import com.azure.ai.inference.ChatCompletionsClientBuilder;
   import com.azure.ai.inference.models.EmbeddingsResult;
   import com.azure.ai.inference.models.EmbeddingItem;
+  import com.azure.ai.inference.models.ChatCompletions;
   import com.azure.core.credential.AzureKeyCredential;
   import com.azure.core.util.Configuration;
 
 
@@ -24,19 +24,11 @@ To use chat completion models in your application, you need:
 
 [!INCLUDE [how-to-prerequisites](../how-to-prerequisites.md)]
 
-* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
-
-* Install the [Azure AI inference package](https://aka.ms/azsdk/azure-ai-inference/python/reference) with the following command:
+[!INCLUDE [how-to-prerequisites-csharp](../how-to-prerequisites-csharp.md)]
 
-    ```bash
-    dotnet add package Azure.AI.Inference --prerelease
-    ```
-    
-* If you are using Entra ID, you also need the following package:
+* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
 
-    ```bash
-    dotnet add package Azure.Identity
-    ```
+    * This example uses `mistral-large-2407`.
 
 ## Use chat completions
 
 
@@ -24,58 +24,61 @@ To use chat completion models in your application, you need:
 
 [!INCLUDE [how-to-prerequisites](../how-to-prerequisites.md)]
 
+[!INCLUDE [how-to-prerequisites-java](../how-to-prerequisites-java.md)]
+
 * A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
 
-* Add the [Azure AI inference package](https://aka.ms/azsdk/azure-ai-inference/java/reference) to your project:
-
-  ```xml
-  <dependency>
-      <groupId>com.azure</groupId>
-      <artifactId>azure-ai-inference</artifactId>
-      <version>1.0.0-beta.1</version>
-  </dependency>
-  ```
-  
-* If you are using Entra ID, you also need the following package:
-
-  ```xml
-  <dependency>
-      <groupId>com.azure</groupId>
-      <artifactId>azure-identity</artifactId>
-      <version>1.13.3</version>
-  </dependency>
-  ```
-
-* Import the following namespace:
-  
-  ```java
-  package com.azure.ai.inference.usage;
-  
-  import com.azure.ai.inference.EmbeddingsClient;
-  import com.azure.ai.inference.EmbeddingsClientBuilder;
-  import com.azure.ai.inference.models.EmbeddingsResult;
-  import com.azure.ai.inference.models.EmbeddingItem;
-  import com.azure.core.credential.AzureKeyCredential;
-  import com.azure.core.util.Configuration;
-  
-  import java.util.ArrayList;
-  import java.util.List;
-  ```
+    * This example uses `mistral-large-2407`.
 
 ## Use chat completions
 
 First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
 
+```java
+ChatCompletionsClient client = new ChatCompletionsClientBuilder()
+    .credential(new AzureKeyCredential("{key}"))
+    .endpoint("https://<resource>.services.ai.azure.com/models")
+    .buildClient();
+```
+
 If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
 
+```java
+TokenCredential defaultCredential = new DefaultAzureCredentialBuilder().build();
+ChatCompletionsClient client = new ChatCompletionsClientBuilder()
+    .credential(defaultCredential)
+    .endpoint("https://<resource>.services.ai.azure.com/models")
+    .buildClient();
+```
+
+
 ### Create a chat completion request
 
 The following example shows how you can create a basic chat completions request to the model.
+
+```java
+List<ChatRequestMessage> chatMessages = new ArrayList<>();
+chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant."));
+chatMessages.add(new ChatRequestUserMessage("How many languages are in the world?"));
+
+ChatCompletions response = client.complete(new ChatCompletionsOptions(chatMessages));
+```
+
 > [!NOTE]
 > Some models don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence.
 
 The response is as follows, where you can see the model's usage statistics:
 
+```java
+System.out.printf("Model ID=%s is created at %s.%n", chatCompletions.getId(), chatCompletions.getCreated());
+for (ChatChoice choice : chatCompletions.getChoices()) {
+    ChatResponseMessage message = choice.getMessage();
+    System.out.printf("Index: %d, Chat Role: %s.%n", choice.getIndex(), message.getRole());
+    System.out.println("Message:");
+    System.out.println(message.getContent());
+}
+```
+
 ```console
 Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.
 Model: mistral-large-2407
@@ -93,7 +96,26 @@ By default, the completions API returns the entire generated content in a single
 
 You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field.
 
-You can visualize how streaming generates content:
+```java
+List<ChatRequestMessage> chatMessages = new ArrayList<>();
+chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant."));
+chatMessages.add(new ChatRequestUserMessage("How many languages are in the world?"));
+
+client.completeStream(new ChatCompletionsOptions(chatMessages))
+    .forEach(chatCompletions -> {
+        if (CoreUtils.isNullOrEmpty(chatCompletions.getChoices())) {
+            return;
+        }
+        StreamingChatResponseMessageUpdate delta = chatCompletions.getChoice().getDelta();
+        if (delta.getRole() != null) {
+            System.out.println("Role = " + delta.getRole());
+        }
+        if (delta.getContent() != null) {
+            String content = delta.getContent();
+            System.out.print(content);
+        }
+    });
+```
 
 #### Explore more parameters supported by the inference client
 
@@ -141,29 +163,3 @@ The following example shows how to handle events when the model detects harmful
 
 > [!TIP]
 > To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
-
-## Use chat completions with images
-
-Some models can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Some models for vision in a chat fashion:
-
-> [!IMPORTANT]
-> Some models support only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error.
-
-To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
-
-Visualize the image:
-
-:::image type="content" source="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg" alt-text="A chart displaying the relative capabilities between large language models and small language models." lightbox="../../../../ai-foundry/media/how-to/sdks/small-language-models-chart-example.jpg":::
-
-Now, create a chat completion request with the image:
-
-The response is as follows, where you can see the model's usage statistics:
-
-```console
-ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance.
-Model: mistral-large-2407
-Usage: 
-  Prompt tokens: 2380
-  Completion tokens: 126
-  Total tokens: 2506
-```
@@ -25,13 +25,15 @@ To use chat completion models in your application, you need:
 [!INCLUDE [how-to-prerequisites](../how-to-prerequisites.md)]
 
 [!INCLUDE [how-to-prerequisites-python](../how-to-prerequisites-python.md)]
-  
+
+* A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
+
+    * This example uses `mistral-large-2407`.
 
 ## Use chat completions
 
 First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
 
-
 ```python
 import os
 from azure.ai.inference import ChatCompletionsClient
 
@@ -26,24 +26,28 @@ To use chat completion models in your application, you need:
 
 * A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
 
+    * This example uses `mistral-large-2407`.
+
 ## Use chat completions
 
-To use chat completions API, use the route `/chat/completions` appended to the base URL along with your credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
+To use chat completions API, use the route `/chat/completions` appended to the base URL along with your credential indicated in `api-key`. 
 
 ```http
 POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
 Content-Type: application/json
 api-key: <key>
 ```
 
-If you have configured the resource with **Microsoft Entra ID** support, pass you token in the `Authorization` header:
+If you have configured the resource with **Microsoft Entra ID** support, pass you token in the `Authorization` header with the format `Bearer <token>`. Use scope `https://cognitiveservices.azure.com/.default`. 
 
 ```http
 POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
 Content-Type: application/json
 Authorization: Bearer <token>
 ```
 
+Using Microsoft Entra ID may require additional configuration in your resource to grant access. Learn how to [configure key-less authentication with Microsoft Entra ID](../../how-to/configure-entra-id.md).
+
 ### Create a chat completion request
 
 The following example shows how you can create a basic chat completions request to the model.
 
@@ -16,7 +16,7 @@ zone_pivot_groups: azure-ai-inference-samples
 
 [!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
 
-This article explains how to use chat completions API with models deployed to Azure AI model inference in Azure AI services.
+This article explains how to use chat completions API with models supporting images or audio deployed to Azure AI model inference in Azure AI services.
 
 ## Prerequisites
 
@@ -28,6 +28,7 @@ To use chat completion models in your application, you need:
 
 * A chat completions model deployment. If you don't have one read [Add and configure models to Azure AI services](../../how-to/create-model-deployments.md) to add a chat completions model to your resource.
 
+    * This example uses `phi-4-multimodal-instruct`.
 
 ## Use chat completions