MicrosoftDocs
diff --git a/‎articles/ai-foundry/model-inference/how-to/configure-entra-id.md
Lines changed: 32 additions & 0 deletions b/‎articles/ai-foundry/model-inference/how-to/configure-entra-id.md
Lines changed: 32 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/quickstart-ai-project.md
Lines changed: 12 additions & 11 deletions b/‎articles/ai-foundry/model-inference/how-to/quickstart-ai-project.md
Lines changed: 12 additions & 11 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md
Lines changed: 42 additions & 4 deletions b/‎articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md
Lines changed: 42 additions & 4 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/configure-entra-id/about-credentials.md
Lines changed: 25 additions & 0 deletions b/‎articles/ai-foundry/model-inference/includes/configure-entra-id/about-credentials.md
Lines changed: 25 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/configure-entra-id/bicep.md
Lines changed: 107 additions & 0 deletions b/‎articles/ai-foundry/model-inference/includes/configure-entra-id/bicep.md
Lines changed: 107 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/includes/configure-entra-id/cli.md
Lines changed: 92 additions & 0 deletions b/‎articles/ai-foundry/model-inference/includes/configure-entra-id/cli.md
Lines changed: 92 additions & 0 deletions
@@ -0,0 +1,32 @@
+---
+title: Configure key-less authentication with Microsoft Entra ID
+titleSuffix: Azure AI Foundry
+description: Learn how to configure key-less authorization to use Azure AI model inference with Microsoft Entra ID.
+ms.service: azure-ai-model-inference
+ms.topic: how-to
+ms.date: 10/01/2024
+ms.custom: ignite-2024, github-universe-2024
+manager: nitinme
+author: mrbullwinkle
+ms.author: fasantia 
+recommendations: false
+zone_pivot_groups: azure-ai-models-deployment
+---
+
+# Configure key-less authentication with Microsoft Entra ID
+
+::: zone pivot="ai-foundry-portal"
+[!INCLUDE [portal](../includes/configure-entra-id/portal.md)]
+::: zone-end
+
+::: zone pivot="programming-language-cli"
+[!INCLUDE [cli](../includes/configure-entra-id/cli.md)]
+::: zone-end
+
+::: zone pivot="programming-language-bicep"
+[!INCLUDE [bicep](../includes/configure-entra-id/bicep.md)]
+::: zone-end
+
+## Next steps
+
+* [Develop applications using Azure AI model inference service in Azure AI services](../supported-languages.md)
@@ -14,17 +14,17 @@ recommendations: false
 
 # Configure your AI project to use Azure AI model inference
 
-If you already have an AI project in an existing AI Hub, models via "Models as a Service" are by default deployed inside of your project as stand-alone endpoints. Each model deployment has its own set of URI and credentials to access it. Azure OpenAI models are deployed to Azure AI Services resource or to the Azure OpenAI Service resource.
+If you already have an AI project in Azure AI Foundry, the model catalog deploys models from third-party model providers as stand-alone endpoints in your project by default. Each model deployment has its own set of URI and credentials to access it. On the other hand, Azure OpenAI models are deployed to Azure AI Services resource or to the Azure OpenAI Service resource.
 
-You can configure the AI project to connect with the Azure AI model inference in Azure AI services. Once configured, **deployments of Models as a Service models happen to the connected Azure AI Services resource** instead to the project itself, giving you a single set of endpoint and credential to access all the models deployed in Azure AI Foundry. 
+You can change this behavior and deploy both types of models to Azure AI Services resources using Azure AI model inference. Once configured, **deployments of Models as a Service models supporting pay-as-you-go billing happen to the connected Azure AI Services resource** instead to the project itself, giving you a single set of endpoint and credential to access all the models deployed in Azure AI Foundry. You can manage Azure OpenAI and third-party model providers models in the same way.
 
 Additionally, deploying models to Azure AI model inference brings the extra benefits of:
 
 > [!div class="checklist"]
-> * [Routing capability](../concepts/endpoints.md#routing)
-> * [Custom content filters](../concepts/content-filter.md)
-> * Global capacity deployment
-> * Entra ID support and role-based access control
+> * [Routing capability](../concepts/endpoints.md#routing).
+> * [Custom content filters](../concepts/content-filter.md).
+> * Global capacity deployment type.
+> * [Key-less authentication](configure-entra-id.md) with role-based access control.
 
 In this article, you learn how to configure your project to use models deployed in Azure AI model inference in Azure AI services.
 
@@ -104,7 +104,7 @@ For each model you want to deploy under Azure AI model inference, follow these s
 
 6. You can configure the deployment settings at this time. By default, the deployment receives the name of the model you're deploying. The deployment name is used in the `model` parameter for request to route to this particular model deployment. It allows you to configure specific names for your models when you attach specific configurations. For instance, `o1-preview-safe` for a model with a strict content safety content filter.
 
-7. We automatically select an Azure AI Services connection depending on your project because you have turned on the feature **Deploy models to Azure AI model inference service**. Use the **Customize** option to change the connection based on your needs. If you're deploying under the **Standard** deployment type, the models need to be available in the region of the Azure AI Services resource.
+7. We automatically select an Azure AI Services connection depending on your project because you turned on the feature **Deploy models to Azure AI model inference service**. Use the **Customize** option to change the connection based on your needs. If you're deploying under the **Standard** deployment type, the models need to be available in the region of the Azure AI Services resource.
 
     :::image type="content" source="../media/add-model-deployments/models-deploy-customize.png" alt-text="Screenshot showing how to customize the deployment if needed." lightbox="../media/add-model-deployments/models-deploy-customize.png":::
 
@@ -152,7 +152,7 @@ Although you configured the project to use the Azure AI model inference, existin
 
 ### Upgrade your code with the new endpoint
 
-Once the models are deployed under Azure AI Services, you can upgrade your code to use the Azure AI model inference endpoint. The main difference between how Serverless API endpoints and Azure AI model inference works reside in the endpoint URL and model parameter. While Serverless API Endpoints have set of URI and key per each model deployment, Azure AI model inference has only one for all of them.
+Once the models are deployed under Azure AI Services, you can upgrade your code to use the Azure AI model inference endpoint. The main difference between how Serverless API endpoints and Azure AI model inference works reside in the endpoint URL and model parameter. While Serverless API Endpoints have a set of URI and key per each model deployment, Azure AI model inference has only one for all of them.
 
 The following table summarizes the changes you have to introduce:
 
@@ -186,10 +186,11 @@ For each model deployed as Serverless API Endpoints, follow these steps:
 
 ## Limitations
 
-Azure AI model inference in Azure AI Services gives users access to flagship models in the Azure AI model catalog. However, only models supporting pay-as-you-go billing (Models as a Service) are available for deployment. 
+Consider the following limitations when configuring your project to use Azure AI model inference:
 
-Models requiring compute quota from your subscription (Managed Compute), including custom models, can only be deployed within a given project as Managed Online Endpoints and continue to be accessible using their own set of endpoint URI and credentials.
+* Only models supporting pay-as-you-go billing (Models as a Service) are available for deployment to Azure AI model inference. Models requiring compute quota from your subscription (Managed Compute), including custom models, can only be deployed within a given project as Managed Online Endpoints and continue to be accessible using their own set of endpoint URI and credentials.
+* Models available as both pay-as-you-go billing and managed compute offerings are, by default, deployed to Azure AI model inference in Azure AI services resources. Azure AI Foundry portal doesn't offer a way to deploy them to Managed Online Endpoints. You have to turn off the feature mentioned at [Configure the project to use Azure AI model inference](#configure-the-project-to-use-azure-ai-model-inference) or use the Azure CLI/Azure ML SDK/ARM templates to perform the deployment.
 
 ## Next steps
 
-* [Add more models](create-model-deployments.md) to your endpoint.
+* [Add more models](create-model-deployments.md) to your endpoint.
@@ -28,6 +28,7 @@ from azure.identity import AzureDefaultCredential
 model = ChatCompletionsClient(
     endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
     credential=AzureDefaultCredential(),
+    model="mistral-large-2407",
 )
 ```
 
@@ -48,7 +49,8 @@ import { AzureDefaultCredential } from "@azure/identity";
 
 const client = new ModelClient(
     process.env.AZUREAI_ENDPOINT_URL, 
-    new AzureDefaultCredential()
+    new AzureDefaultCredential(),
+    "mistral-large-2407"
 );
 ```
 
@@ -79,13 +81,43 @@ Then, you can use the package to consume the model. The following example shows
 ```csharp
 ChatCompletionsClient client = new ChatCompletionsClient(
     new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
-    new DefaultAzureCredential(includeInteractiveCredentials: true)
+    new AzureDefaultCredential(includeInteractiveCredentials: true),
+    "mistral-large-2407"
 );
 ```
 
+# [Java](#tab/java)
+
+Add the package to your project:
+
+```xml
+<dependency>
+    <groupId>com.azure</groupId>
+    <artifactId>azure-ai-inference</artifactId>
+    <version>1.0.0-beta.1</version>
+</dependency>
+<dependency>
+    <groupId>com.azure</groupId>
+    <artifactId>azure-identity</artifactId>
+    <version>1.13.3</version>
+</dependency>
+```
+
+Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
+
+```java
+ChatCompletionsClient client = new ChatCompletionsClientBuilder()
+    .credential(new DefaultAzureCredential()))
+    .endpoint("{endpoint}")
+    .model("mistral-large-2407")
+    .buildClient();
+```
+
+Explore our [samples](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-inference/src/samples) and read the [API reference documentation](https://aka.ms/azsdk/azure-ai-inference/java/reference) to get yourself started.
+
 # [REST](#tab/rest)
 
-Use the reference section to explore the API design and which parameters are available and indicate authentication token in the header `Authorization`. For example, the reference section for [Chat completions](reference-model-inference-chat-completions.md) details how to use the route `/chat/completions` to generate predictions based on chat-formatted instructions. Notice that the path `/models` is included to the root of the URL:
+Use the reference section to explore the API design and which parameters are available and indicate authentication token in the header `Authorization`. For example, the reference section for [Chat completions](../../../ai-studio/reference/reference-model-inference-chat-completions.md) details how to use the route `/chat/completions` to generate predictions based on chat-formatted instructions. Notice that the path `/models` is included to the root of the URL:
 
 __Request__
 
@@ -94,4 +126,10 @@ POST models/chat/completions?api-version=2024-04-01-preview
 Authorization: Bearer <bearer-token>
 Content-Type: application/json
 ```
----
+
+For testing purposes, the easiest way to get a valid token for your user account is to use the Azure CLI. In a console, run the following Azure CLI command:
+
+```azurecli
+az account get-access-token --resource https://cognitiveservices.azure.com --query "accessToken" --output tsv
+```
+---
@@ -0,0 +1,25 @@
+---
+manager: nitinme
+author: mrbullwinkle
+ms.author: fasantia 
+ms.service: azure-ai-model-inference
+ms.date: 01/23/2025
+ms.topic: include
+---
+
+### Options for credential when using Microsoft Entra ID
+
+`DefaultAzureCredential` is an opinionated, ordered sequence of mechanisms for authenticating to Microsoft Entra ID. Each authentication mechanism is a class derived from the `TokenCredential` class and is known as a credential. At runtime, `DefaultAzureCredential` attempts to authenticate using the first credential. If that credential fails to acquire an access token, the next credential in the sequence is attempted, and so on, until an access token is successfully obtained. In this way, your app can use different credentials in different environments without writing environment-specific code.
+
+When the preceding code runs on your local development workstation, it looks in the environment variables for an application service principal or at locally installed developer tools, such as Visual Studio, for a set of developer credentials. Either approach can be used to authenticate the app to Azure resources during local development.
+
+When deployed to Azure, this same code can also authenticate your app to other Azure resources. `DefaultAzureCredential` can retrieve environment settings and managed identity configurations to authenticate to other services automatically.
+
+### Best practices
+
+* Use deterministic credentials in production environments: Strongly consider moving from `DefaultAzureCredential` to one of the following deterministic solutions on production environments:
+
+  * A specific `TokenCredential` implementation, such as `ManagedIdentityCredential`. See the [Derived list for options](/dotnet/api/azure.core.tokencredential#definition).
+  * A pared-down `ChainedTokenCredential` implementation optimized for the Azure environment in which your app runs. `ChainedTokenCredential` essentially creates a specific allowlist of acceptable credential options, such as `ManagedIdentity` for production and `VisualStudioCredential` for development.
+
+* Configure system-assigned or user-assigned managed identities to the Azure resources where your code is running if possible. Configure Microsoft Entra ID access to those specific identities. 
@@ -0,0 +1,107 @@
+---
+manager: nitinme
+author: mrbullwinkle
+ms.author: fasantia 
+ms.service: azure-ai-model-inference
+ms.date: 12/15/2024
+ms.topic: include
+zone_pivot_groups: azure-ai-models-deployment
+---
+
+[!INCLUDE [Header](intro.md)]
+
+* Install the [Azure CLI](/cli/azure/).
+
+* Identify the following information:
+
+  * Your Azure subscription ID.
+
+## About this tutorial
+
+The example in this article is based on code samples contained in the [Azure-Samples/azureai-model-inference-bicep](https://github.com/Azure-Samples/azureai-model-inference-bicep) repository. To run the commands locally without having to copy or paste file content, use the following commands to clone the repository and go to the folder for your coding language:
+
+```azurecli
+git clone https://github.com/Azure-Samples/azureai-model-inference-bicep
+```
+
+The files for this example are in:
+
+```azurecli
+cd azureai-model-inference-bicep/infra
+```
+
+## Understand the resources
+
+The tutorial helps you create:
+
+> [!div class="checklist"]
+> * An Azure AI Services resource with key access disabled. For simplicity, this template doesn't deploy models.
+> * A role-assignment for a given security principal with the role **Cognitive Services User**.
+
+You are using the following assets to create those resources:
+
+1. Use the template `modules/ai-services-template.bicep` to describe your Azure AI Services resource:
+
+    __modules/ai-services-template.bicep__
+
+    :::code language="bicep" source="~/azureai-model-inference-bicep/infra/modules/ai-services-template.bicep":::
+
+    > [!TIP]
+    > Notice that this template can take the parameter `allowKeys` which, when `false` will disable the use of keys in the resource. This configuration is optional.
+
+2. Use the template `modules/role-assignment-template.bicep` to describe a role assignment in Azure:
+
+    __modules/role-assignment-template.bicep__
+
+    :::code language="bicep" source="~/azureai-model-inference-bicep/infra/modules/role-assignment-template.bicep":::
+
+## Create the resources
+
+In your console, follow these steps:
+
+1. Define the main deployment:
+
+    __deploy-entra-id.bicep__
+
+    :::code language="bicep" source="~/azureai-model-inference-bicep/infra/deploy-entra-id.bicep":::
+
+2. Log into Azure:
+
+    ```azurecli
+    az login
+    ```
+
+3. Ensure you are in the right subscription:
+
+    ```azurecli
+    az account set --subscription "<subscription-id>"
+    ```
+
+4. Run the deployment:
+
+    ```azurecli
+    RESOURCE_GROUP="<resource-group-name>"
+    SECURITY_PRINCIPAL_ID="<your-security-principal-id>"
+    
+    az deployment group create \
+      --resource-group $RESOURCE_GROUP \
+      --securityPrincipalId $SECURITY_PRINCIPAL_ID
+      --template-file deploy-entra-id.bicep
+    ```
+
+7. The template outputs the Azure AI model inference endpoint that you can use to consume any of the model deployments you have created.
+
+
+## Use Microsoft Entra ID in your code
+
+Once you configured Microsoft Entra ID in your resource, you need to update your code to use it when consuming the inference endpoint. The following example shows how to use a chat completions model:
+
+[!INCLUDE [code](../code-create-chat-client-entra.md)]
+
+[!INCLUDE [about-credentials](about-credentials.md)]
+
+
+
+## Disable key-based authentication in the resource
+
+Disabling key-based authentication is advisable when you implemented Microsoft Entra ID and fully addressed compatibility or fallback concerns in all the applications that consume the service.
@@ -0,0 +1,92 @@
+---
+manager: nitinme
+author: mrbullwinkle
+ms.author: fasantia 
+ms.service: azure-ai-model-inference
+ms.date: 12/15/2024
+ms.topic: include
+zone_pivot_groups: azure-ai-models-deployment
+---
+
+[!INCLUDE [Header](intro.md)]  
+
+* Install the [Azure CLI](/cli/azure/).
+
+* Identify the following information:
+
+  * Your Azure subscription ID.
+
+  * Your Azure AI Services resource name.
+
+  * The resource group where the Azure AI Services resource is deployed.
+
+
+## Configure Microsoft Entra ID for inference
+
+Follow these steps to configure Microsoft Entra ID for inference in your Azure AI Services resource:
+
+
+1. Log in into your Azure subscription:
+
+    ```azurecli
+    az login
+    ```
+
+2. If you have more than one subscription, select the subscription where your resource is located:
+
+    ```azurecli
+    az account set --subscription "<subscription-id>"
+    ```
+
+3. Set the following environment variables with the name of the Azure AI Services resource you plan to use and resource group.
+
+    ```azurecli
+    ACCOUNT_NAME="<ai-services-resource-name>"
+    RESOURCE_GROUP="<resource-group>"
+    ```
+
+4. Get the full name of your resource:
+
+    ```azurecli
+    RESOURCE_ID=$(az resource show -g $RESOURCE_GROUP -n $ACCOUNT_NAME --resource-type "Microsoft.CognitiveServices/accounts")
+    ```
+
+5. Get the object ID of the security principal you want to assign permissions to. The following example shows how to get the object ID associated with:
+    
+    __Your own logged in account:__
+
+    ```azurecli
+    OBJECT_ID=$(az ad signed-in-user show --query id --output tsv)
+    ```
+
+    __A security group:__
+
+    ```azurecli
+    OBJECT_ID=$(az ad group show --group "<group-name>" --query id --output tsv)
+    ```
+
+    __A service principal:__
+
+    ```azurecli
+    OBJECT_ID=$(az ad sp show --id "<service-principal-guid>" --query id --output tsv)
+    ```
+    
+6. Assign the **Cognitive Services User** role to the service principal (scoped to the resource). By assigning a role, you're granting service principal access to this resource.
+
+    ```azurecli
+    az role assignment create --assignee-object-id $OBJECT_ID --role "Cognitive Services User" --scope $RESOURCE_ID
+    ```
+
+8.  The selected user can now use Microsoft Entra ID for inference.
+
+    > [!TIP]
+    > Keep in mind that Azure role assignments may take up to five minutes to propagate. Adding or removing users from a security group propagates immediately.
+
+
+## Use Microsoft Entra ID in your code
+
+Once Microsoft Entra ID is configured in your resource, you need to update your code to use it when consuming the inference endpoint. The following example shows how to use a chat completions model:
+
+[!INCLUDE [code](../code-create-chat-client-entra.md)]
+
+[!INCLUDE [about-credentials](about-credentials.md)]