fixes

santiagxf · santiagxf · commit fbfead4624d0 · 2025-04-09T12:20:43.000-04:00
diff --git a/articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md b/articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md
@@ -22,6 +22,19 @@ import os
 from azure.ai.inference import ChatCompletionsClient
 from azure.identity import DefaultAzureCredential
 
+client = ChatCompletionsClient(
+    endpoint="https://<resource>.services.ai.azure.com/models",
+    credential=DefaultAzureCredential(),
+)
+```
+
+If you need to configure a custom audience, do as follows:
+
+```python
+import os
+from azure.ai.inference import ChatCompletionsClient
+from azure.identity import DefaultAzureCredential
+
 client = ChatCompletionsClient(
     endpoint="https://<resource>.services.ai.azure.com/models",
     credential=DefaultAzureCredential(),
@@ -44,6 +57,19 @@ import ModelClient from "@azure-rest/ai-inference";
 import { isUnexpected } from "@azure-rest/ai-inference";
 import { DefaultAzureCredential } from "@azure/identity";
 
+const client = new ModelClient(
+    "https://<resource>.services.ai.azure.com/models", 
+    new DefaultAzureCredential()
+);
+```
+
+If you need to configure a custom audience, do as follows:
+
+```javascript
+import ModelClient from "@azure-rest/ai-inference";
+import { isUnexpected } from "@azure-rest/ai-inference";
+import { DefaultAzureCredential } from "@azure/identity";
+
 const clientOptions = { credentials: { "https://cognitiveservices.azure.com" } };
 
 const client = new ModelClient(
@@ -77,6 +103,15 @@ using Azure.AI.Inference;
 
 Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions with Entra ID:
 
+```csharp
+ChatCompletionsClient client = new ChatCompletionsClient(
+    new Uri("https://<resource>.services.ai.azure.com/models"),
+    new DefaultAzureCredential()
+);
+```
+
+If you need to configure a custom audience, do as follows:
+
 ```csharp
 TokenCredential credential = new DefaultAzureCredential();
 AzureAIInferenceClientOptions clientOptions = new AzureAIInferenceClientOptions();
@@ -86,7 +121,7 @@ clientOptions.AddPolicy(tokenPolicy, HttpPipelinePosition.PerRetry);
 ChatCompletionsClient client = new ChatCompletionsClient(
     new Uri("https://<resource>.services.ai.azure.com/models"),
     credential,
-    clientOptions.
+    clientOptions
 );
 ```
 
diff --git a/articles/ai-foundry/model-inference/includes/configure-entra-id/intro.md b/articles/ai-foundry/model-inference/includes/configure-entra-id/intro.md
@@ -7,8 +7,6 @@ ms.date: 01/23/2025
 ms.topic: include
 ---
 
-[!INCLUDE [Feature preview](../../../includes/feature-preview.md)]
-
 Models deployed to Azure AI model inference in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
 
 This article explains how to configure Microsoft Entra ID for inference in Azure AI model inference.
diff --git a/articles/ai-foundry/model-inference/includes/configure-entra-id/troubleshooting.md b/articles/ai-foundry/model-inference/includes/configure-entra-id/troubleshooting.md
@@ -29,6 +29,7 @@ The following table contains multiple scenarios that can help troubleshooting Mi
 | You're using an SDK. | Known issues. | Before making further troubleshooting, it's advisable to install the latest version of the software you are using to connect to the service. Authentication bugs may have been fixed in a newer version of the software you're using. |
 | `401 Principal does not have access to API/Operation` | The request indicates authentication in the correct way, however, the user principal doesn't have the required permissions to use the inference endpoint. | Ensure you have: <br /> 1. Assigned the role **Cognitive Services User** to your principal to the Azure AI Services resource. Notice that **Cognitive Services OpenAI User** grants only access to OpenAI models. **Owner** or **Contributor** don't provide access either.<br /> 2. Wait at least 5 minutes before making the first call. |
 | `401 HTTP/1.1 401 PermissionDenied` | The request indicates authentication in the correct way, however, the user principal doesn't have the required permissions to use the inference endpoint. | Assigned the role **Cognitive Services User** to your principal in the Azure AI Services resource. Roles like **Administrator** or **Contributor** don't grand inference access. Wait at least 5 minutes before making the first call. |
+| You're using Microsoft Entra ID with Azure AI Inference package and you get `401 Unauthorized. Access token is missing, invalid, audience is incorrect, or have expired.` | The request is failing to perform authentication with Entra ID. | Ensure you have the latest version of the Azure AI Inference package installed. The default scope for Microsoft Entra ID may have changed from the version you were using before. Azure AI Model Inference uses the scope `https://cognitiveservices.azure.com/.default`. |
 | You're using REST API calls and you get `401 Unauthorized. Access token is missing, invalid, audience is incorrect, or have expired.` | The request is failing to perform authentication with Entra ID. | Ensure the `Authentication` header contains a valid token with a scope `https://cognitiveservices.azure.com/.default`. |
 | You're using `AzureOpenAI` class and you get `401 Unauthorized. Access token is missing, invalid, audience is incorrect, or have expired.` | The request is failing to perform authentication with Entra ID. | Ensure that you are using an **OpenAI model** connected to the endpoint `https://<resource>.openai.azure.com`. You can't use `OpenAI` class or a Models-as-a-Service model. If your model is not from OpenAI, use the Azure AI Inference SDK. |
 | You're using the Azure AI Inference SDK and you get `401 Unauthorized. Access token is missing, invalid, audience is incorrect, or have expired.` | The request is failing to perform authentication with Entra ID. | Ensure you're connected to the endpoint `https://<resource>.services.ai.azure.com/model` and that you indicated the right scope for Entra ID (`https://cognitiveservices.azure.com/.default`). |
diff --git a/articles/ai-foundry/model-inference/includes/create-model-deployments/intro.md b/articles/ai-foundry/model-inference/includes/create-model-deployments/intro.md
@@ -7,8 +7,6 @@ ms.date: 1/21/2025
 ms.topic: include
 ---
 
-[!INCLUDE [Feature preview](../../../includes/feature-preview.md)]
-
 You can decide and configure which models are available for inference in the inference endpoint. When a given model is configured, you can then generate predictions from it by indicating its model name or deployment name on your requests. No further changes are required in your code to use it.
 
 In this article, you'll learn how to add a new model to Azure AI model inference in Azure AI Foundry.
diff --git a/articles/ai-foundry/model-inference/includes/create-resources/intro.md b/articles/ai-foundry/model-inference/includes/create-resources/intro.md
@@ -7,8 +7,6 @@ ms.date: 1/21/2025
 ms.topic: include
 ---
 
-[!INCLUDE [Feature preview](../../../includes/feature-preview.md)]
-
 In this article, you learn how to create the resources required to use Azure AI model inference and consume flagship models from Azure AI model catalog.
 
 ## Understand the resources
diff --git a/articles/ai-foundry/model-inference/reference/api-version-updates.md b/articles/ai-foundry/model-inference/reference/api-version-updates.md
@@ -0,0 +1,44 @@
+---
+title: Azure AI Model Inference API version lifecycle
+titleSuffix: Azure AI Foundry
+description: Learn more about API version retirement in Azure OpenAI Services.
+manager: scottpolly
+ms.service: azure-ai-model-inference
+ms.topic: conceptual
+ms.date: 03/01/2025
+ms.reviewer: fasantia
+ms.author: mopeakande
+author: msakande
+---
+
+# Azure AI Model Inference API lifecycle
+
+This article explains Azure AI Model Inference API versions and how you think about them. Whenever possible we recommend using either the latest GA, or preview API releases.
+
+## Latest API releases
+
+The following list contains the latest releases of APIs for Azure AI Model Inference. 
+
+### 2025-04-01
+
+This version expands the previous API version and introduces the following features:
+
+* General availability.
+* Reasoning models return reasoning content in the field `reasoning_content` on messages of with role `assistant`. When streaming content, both `content` and `reasoning_content` are included in deltas.
+* Route `/info` adds an optional parameter `model` to indicate the model deployment name to get information from when the endpoint is running multiple model deployments.
+
+### 2024-05-01-preview
+
+This version introduces the following features:
+
+* Embeddings models.
+* Image embeddings models.
+* Chat completions models with images and audio inputs.
+
+## Deprecation
+
+The following API version has been deprecated and marked for retirement:
+
+| API Version        | Status     | Deprecation date | Retirement date | Replacement |
+|--------------------|------------|------------------|-----------------|-------------|
+| 2024-05-01-preview | Deprecated | 04/10/2024       | 04/10/2025      | 2025-04-01  |
diff --git a/articles/ai-foundry/model-inference/toc.yml b/articles/ai-foundry/model-inference/toc.yml
@@ -104,6 +104,8 @@ items:
       items:
       - name: What's the Azure AI Model Inference API?
         href: /rest/api/aifoundry/modelinference
+      - name: API versioning
+        href: ./reference/api-version-updates.md
       - name: Reference
         items:
         - name: Get Model Info