MicrosoftDocs
diff --git a/‎articles/ai-foundry/how-to/deploy-models-managed.md‎
Lines changed: 75 additions & 26 deletions b/‎articles/ai-foundry/how-to/deploy-models-managed.md‎
Lines changed: 75 additions & 26 deletions
diff --git a/‎articles/ai-foundry/media/deploy-models-managed/catalog-filter-managed-compute.png‎
168 KB b/‎articles/ai-foundry/media/deploy-models-managed/catalog-filter-managed-compute.png‎
168 KB
diff --git a/‎articles/ai-foundry/media/deploy-models-managed/deployment-configuration.png‎
138 KB b/‎articles/ai-foundry/media/deploy-models-managed/deployment-configuration.png‎
138 KB
diff --git a/‎articles/ai-foundry/media/deploy-monitor/catalog-filter-managed-compute.png‎
-166 KB b/‎articles/ai-foundry/media/deploy-monitor/catalog-filter-managed-compute.png‎
-166 KB
diff --git a/‎zone-pivots/zone-pivot-groups.yml‎
Lines changed: 9 additions & 1 deletion b/‎zone-pivots/zone-pivot-groups.yml‎
Lines changed: 9 additions & 1 deletion
@@ -1,51 +1,91 @@
 ---
-title: How to deploy and inference a managed compute deployment with code
+title: How to deploy and inference a managed compute deployment
 titleSuffix: Azure AI Foundry
-description: Learn how to deploy and inference a managed compute deployment with code.
+description: Learn how to deploy large language models on managed compute in Azure AI Foundry and perform real-time inference for generative AI applications.
 ms.service: azure-ai-foundry
 ms.custom:
   - build-2024
 ms.topic: how-to
-ms.date: 05/19/2025
+ms.date: 08/19/2025
 ms.reviewer: fasantia 
 reviewer: santiagxf
 ms.author: mopeakande
 manager: nitinme
 author: msakande
+zone_pivot_groups: azure-ai-managed-compute-deployment
+ai-usage: ai-assisted
+
+#CustomerIntent: As an Azure AI developer, I want to deploy and perform inference on large language models using managed compute in Azure AI Foundry so that I can make models available for real-time generative AI applications in production environments.
 ---
 
-# How to deploy and inference a managed compute deployment with code
+# How to deploy and infer with a managed compute deployment
 
-The Azure AI Foundry portal [model catalog](../how-to/model-catalog-overview.md) offers over 1,600 models, and a common way to deploy these models is to use the managed compute deployment option, which is also sometimes referred to as a managed online deployment. 
+The Azure AI Foundry portal [model catalog](../how-to/model-catalog-overview.md) offers over 1,600 models. A common way to deploy these models is to use the managed compute deployment option. This option is also sometimes referred to as a managed online deployment. 
 
-Deployment of a large language model (LLM) makes it available for use in a website, an application, or other production environment. Deployment typically involves hosting the model on a server or in the cloud and creating an API or other interface for users to interact with the model. You can invoke the deployment for real-time inference of generative AI applications such as chat and copilot.
+When you deploy a large language model (LLM), you make it available for use in a website, an application, or other production environment. Deployment typically involves hosting the model on a server or in the cloud and creating an API or other interface for users to interact with the model. You can invoke the deployment for real-time inference of generative AI applications such as chat and copilot.
 
-In this article, you learn how to deploy models using the Azure Machine Learning SDK. The article also covers how to perform inference on the deployed model.
+In this article, you learn to deploy models with the managed compute deployment option and to perform inference on the deployed model.
 
 ## Prerequisites
 
-- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
+- An Azure subscription with a valid payment method. Free or trial Azure subscriptions don't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
 
 - If you don't have one, [create a [!INCLUDE [hub-project-name](../includes/hub-project-name.md)]](create-projects.md?pivots=hub-project).
 
-- Marketplace purchases enabled for your Azure subscription. Learn more [here](/azure/cost-management-billing/manage/enable-marketplace-purchases).
+- Foundry [Models from Partners and Community](../model-inference/concepts/models.md#models-from-partners-and-community) require access to Azure Marketplace, while Foundry [Models Sold Directly by Azure](../model-inference/concepts/models.md#models-sold-directly-by-azure) don't have this requirement. Ensure your Azure subscription has the permissions required to subscribe to model offerings in Azure Marketplace. See [Enable Azure Marketplace purchases](/azure/cost-management-billing/manage/enable-marketplace-purchases) to learn more.
+
+- Azure role-based access controls (Azure RBAC) grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Foundry portal](../concepts/rbac-azure-ai-foundry.md).
+
+
+## Find your model in the model catalog
+
+[!INCLUDE [open-catalog](../includes/open-catalog.md)]
+
+4. In the **Deployment options** filter, select **Managed compute**.
+
+    [!INCLUDE [tip-left-pane](../includes/tip-left-pane.md)]
+
+    :::image type="content" source="../media/deploy-models-managed/catalog-filter-managed-compute.png" alt-text="A screenshot of the model catalog showing how to filter for models that can be deployed via managed compute." lightbox="../media/deploy-models-managed/catalog-filter-managed-compute.png"::: 
+
+1. Select a model to open its model card. In this article, use the model `deepset-roberta-base-squad2`.
+
+
+::: zone pivot="ai-foundry-portal"
 
-## Get the model ID
+## Deploy the model
+
+1. From the model's page, select **Use this model** to open the deployment window. 
+1. The deployment window is pre-filled with some selections and parameter values. You can either keep them or change them as desired. You can also select an existing endpoint for the deployment or create a new one. For this example, specify an instance count of `1` and create a new endpoint for the deployment.
+
+    :::image type="content" source="../media/deploy-models-managed/deployment-configuration.png" alt-text="Screenshot of the deployment configuration screen for managed compute deployment in Azure AI Foundry." lightbox="../media/deploy-models-managed/deployment-configuration.png":::
+
+1. Select **Deploy** to create your deployment. The creation process might take a few minutes to complete. When it's complete, the portal opens the model deployment page.
+
+    > [!TIP]
+    > To see endpoints deployed to your project, go to the **My assets** section of the left pane and select **Models + endpoints**.
+
+1. The created endpoint uses key authentication for authorization. To get the keys associated with a given endpoint, follow these steps:
 
-You can deploy managed compute models using the Azure Machine Learning SDK, but first, let's browse the model catalog and get the model ID you need for deployment.
+    1. Select the deployment, and note the endpoint's Target URI and Key.
+    1. Use these credentials to call the deployment and generate predictions.
+ 
 
-[!INCLUDE [tip-left-pane](../includes/tip-left-pane.md)]
+## Consume deployments
 
-1. Sign in to [Azure AI Foundry](https://ai.azure.com/?cid=learnDocs) and go to the **Home** page.
-1. Select **Model catalog** from the left sidebar.
-1. In the **Deployment options** filter, select **Managed compute**.
+After you create your deployment, follow these steps to consume it:
 
-    :::image type="content" source="../media/deploy-monitor/catalog-filter-managed-compute.png" alt-text="A screenshot showing how to filter by managed compute models in the catalog." lightbox="../media/deploy-monitor/catalog-filter-managed-compute.png"::: 
+1. Select **Models + endpoints** under the **My assets** section in your Azure AI Foundry project.
+1. Select your deployment from the **Model deployments** tab.
+1. Go to the **Test** tab for sample inference to the endpoint.
+1. Return to the **Details** tab to copy the deployment's "Target URI", which you can use to run inference with code.
+1. Go to the **Consume** tab of the deployment to find code samples for consumption.
 
-1. Select a model.
-1. Copy the model ID from the details page of the model you selected. It looks something like this: `azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16`
+::: zone-end
 
 
+::: zone pivot="python-sdk"
+6. Copy the model ID from the details page of the model you selected. It looks like this for the selected model: `azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/17`.
+
 
 ## Deploy the model
 
@@ -64,7 +104,7 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
 
     workspace_ml_client = MLClient(
         credential=InteractiveBrowserCredential,
-        subscription_id="your subscription name goes here",
+        subscription_id="your subscription ID goes here",
         resource_group_name="your resource group name goes here",
         workspace_name="your project name goes here",
     )
@@ -92,10 +132,10 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
     workspace_ml_client.online_endpoints.begin_create_or_update(endpoint).wait()
     ```
 
-1. Create a deployment. Replace the model ID in the next code with the model ID that you copied from the details page of the model you selected in the [Get the model ID](#get-the-model-id) section.
+1. Create a deployment. Replace the model ID in the next code with the model ID that you copied from the details page of the model you selected in the [Find your model in the model catalog](#find-your-model-in-the-model-catalog) section.
 
     ```python
-    model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16" 
+    model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/17" 
 
     demo_deployment = ManagedOnlineDeployment(
         name="demo",
@@ -159,17 +199,26 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
     print(json.dumps(response_json, indent=2))
     ```
 
-## Configure Autoscaling
 
-To configure autoscaling for deployments, you can go to Azure portal, locate the Azure resource typed `Machine learning online deployment` in the resource group of the AI project, and use Scaling menu under Setting. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation. 
+::: zone-end
+
+## Configure autoscaling
+
+To configure autoscaling for deployments, follow these steps:
+
+1. Sign in to the [Azure portal](https://portal.azure.com).
+1. Locate the Azure resource type `Machine learning online deployment` for the model you just deployed in the resource group of the AI project.
+1. Select **Settings** > **Scaling** from the left pane.
+1. Select **Custom autoscale** and configure autoscale settings. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation. 
+
 
-## Delete the deployment endpoint
+## Delete the deployment
 
-To delete deployments in Azure AI Foundry portal, select the **Delete** button on the top panel of the deployment details page.
+To delete deployments in the Azure AI Foundry portal, select **Delete deployment** on the top panel of the deployment details page.
 
 ## Quota considerations
 
-To deploy and perform inferencing with real-time endpoints, you consume Virtual Machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Foundry, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request for a quota increase.  
+To deploy and perform inferencing with real-time endpoints, you consume Virtual Machine (VM) core quota that Azure assigns to your subscription on a per-region basis. When you sign up for Azure AI Foundry, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request a quota increase.  
 
 ## Related content
 
 
@@ -800,7 +800,15 @@ groups:
       title: Azure CLI
     - id: programming-language-bicep
       title: Bicep
-
+- id: azure-ai-managed-compute-deployment
+# Owner: mopeakande
+  title: Programming languages
+  prompt: Choose a tool or API
+  pivots:
+    - id: ai-foundry-portal
+      title: Azure AI Foundry portal
+    - id: python-sdk
+      title: Python SDK
 - id: azure-ai-serverless-deployment
 # Owner: mopeakande
   title: Programming languages