Skip to content

Commit 7f6fc76

Browse files
Merge pull request #6650 from msakande/update-managed-compute-article
Add foundry portal steps to managed compute deployment
2 parents 8c6dbe1 + d06fcd1 commit 7f6fc76

File tree

5 files changed

+84
-27
lines changed

5 files changed

+84
-27
lines changed

articles/ai-foundry/how-to/deploy-models-managed.md

Lines changed: 75 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,91 @@
11
---
2-
title: How to deploy and inference a managed compute deployment with code
2+
title: How to deploy and inference a managed compute deployment
33
titleSuffix: Azure AI Foundry
4-
description: Learn how to deploy and inference a managed compute deployment with code.
4+
description: Learn how to deploy large language models on managed compute in Azure AI Foundry and perform real-time inference for generative AI applications.
55
ms.service: azure-ai-foundry
66
ms.custom:
77
- build-2024
88
ms.topic: how-to
9-
ms.date: 05/19/2025
9+
ms.date: 08/19/2025
1010
ms.reviewer: fasantia
1111
reviewer: santiagxf
1212
ms.author: mopeakande
1313
manager: nitinme
1414
author: msakande
15+
zone_pivot_groups: azure-ai-managed-compute-deployment
16+
ai-usage: ai-assisted
17+
18+
#CustomerIntent: As an Azure AI developer, I want to deploy and perform inference on large language models using managed compute in Azure AI Foundry so that I can make models available for real-time generative AI applications in production environments.
1519
---
1620

17-
# How to deploy and inference a managed compute deployment with code
21+
# How to deploy and infer with a managed compute deployment
1822

19-
The Azure AI Foundry portal [model catalog](../how-to/model-catalog-overview.md) offers over 1,600 models, and a common way to deploy these models is to use the managed compute deployment option, which is also sometimes referred to as a managed online deployment.
23+
The Azure AI Foundry portal [model catalog](../how-to/model-catalog-overview.md) offers over 1,600 models. A common way to deploy these models is to use the managed compute deployment option. This option is also sometimes referred to as a managed online deployment.
2024

21-
Deployment of a large language model (LLM) makes it available for use in a website, an application, or other production environment. Deployment typically involves hosting the model on a server or in the cloud and creating an API or other interface for users to interact with the model. You can invoke the deployment for real-time inference of generative AI applications such as chat and copilot.
25+
When you deploy a large language model (LLM), you make it available for use in a website, an application, or other production environment. Deployment typically involves hosting the model on a server or in the cloud and creating an API or other interface for users to interact with the model. You can invoke the deployment for real-time inference of generative AI applications such as chat and copilot.
2226

23-
In this article, you learn how to deploy models using the Azure Machine Learning SDK. The article also covers how to perform inference on the deployed model.
27+
In this article, you learn to deploy models with the managed compute deployment option and to perform inference on the deployed model.
2428

2529
## Prerequisites
2630

27-
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
31+
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions don't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
2832

2933
- If you don't have one, [create a [!INCLUDE [hub-project-name](../includes/hub-project-name.md)]](create-projects.md?pivots=hub-project).
3034

31-
- Marketplace purchases enabled for your Azure subscription. Learn more [here](/azure/cost-management-billing/manage/enable-marketplace-purchases).
35+
- Foundry [Models from Partners and Community](../model-inference/concepts/models.md#models-from-partners-and-community) require access to Azure Marketplace, while Foundry [Models Sold Directly by Azure](../model-inference/concepts/models.md#models-sold-directly-by-azure) don't have this requirement. Ensure your Azure subscription has the permissions required to subscribe to model offerings in Azure Marketplace. See [Enable Azure Marketplace purchases](/azure/cost-management-billing/manage/enable-marketplace-purchases) to learn more.
36+
37+
- Azure role-based access controls (Azure RBAC) grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Foundry portal](../concepts/rbac-azure-ai-foundry.md).
38+
39+
40+
## Find your model in the model catalog
41+
42+
[!INCLUDE [open-catalog](../includes/open-catalog.md)]
43+
44+
4. In the **Deployment options** filter, select **Managed compute**.
45+
46+
[!INCLUDE [tip-left-pane](../includes/tip-left-pane.md)]
47+
48+
:::image type="content" source="../media/deploy-models-managed/catalog-filter-managed-compute.png" alt-text="A screenshot of the model catalog showing how to filter for models that can be deployed via managed compute." lightbox="../media/deploy-models-managed/catalog-filter-managed-compute.png":::
49+
50+
1. Select a model to open its model card. In this article, use the model `deepset-roberta-base-squad2`.
51+
52+
53+
::: zone pivot="ai-foundry-portal"
3254

33-
## Get the model ID
55+
## Deploy the model
56+
57+
1. From the model's page, select **Use this model** to open the deployment window.
58+
1. The deployment window is pre-filled with some selections and parameter values. You can either keep them or change them as desired. You can also select an existing endpoint for the deployment or create a new one. For this example, specify an instance count of `1` and create a new endpoint for the deployment.
59+
60+
:::image type="content" source="../media/deploy-models-managed/deployment-configuration.png" alt-text="Screenshot of the deployment configuration screen for managed compute deployment in Azure AI Foundry." lightbox="../media/deploy-models-managed/deployment-configuration.png":::
61+
62+
1. Select **Deploy** to create your deployment. The creation process might take a few minutes to complete. When it's complete, the portal opens the model deployment page.
63+
64+
> [!TIP]
65+
> To see endpoints deployed to your project, go to the **My assets** section of the left pane and select **Models + endpoints**.
66+
67+
1. The created endpoint uses key authentication for authorization. To get the keys associated with a given endpoint, follow these steps:
3468

35-
You can deploy managed compute models using the Azure Machine Learning SDK, but first, let's browse the model catalog and get the model ID you need for deployment.
69+
1. Select the deployment, and note the endpoint's Target URI and Key.
70+
1. Use these credentials to call the deployment and generate predictions.
71+
3672

37-
[!INCLUDE [tip-left-pane](../includes/tip-left-pane.md)]
73+
## Consume deployments
3874

39-
1. Sign in to [Azure AI Foundry](https://ai.azure.com/?cid=learnDocs) and go to the **Home** page.
40-
1. Select **Model catalog** from the left sidebar.
41-
1. In the **Deployment options** filter, select **Managed compute**.
75+
After you create your deployment, follow these steps to consume it:
4276

43-
:::image type="content" source="../media/deploy-monitor/catalog-filter-managed-compute.png" alt-text="A screenshot showing how to filter by managed compute models in the catalog." lightbox="../media/deploy-monitor/catalog-filter-managed-compute.png":::
77+
1. Select **Models + endpoints** under the **My assets** section in your Azure AI Foundry project.
78+
1. Select your deployment from the **Model deployments** tab.
79+
1. Go to the **Test** tab for sample inference to the endpoint.
80+
1. Return to the **Details** tab to copy the deployment's "Target URI", which you can use to run inference with code.
81+
1. Go to the **Consume** tab of the deployment to find code samples for consumption.
4482

45-
1. Select a model.
46-
1. Copy the model ID from the details page of the model you selected. It looks something like this: `azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16`
83+
::: zone-end
4784

4885

86+
::: zone pivot="python-sdk"
87+
6. Copy the model ID from the details page of the model you selected. It looks like this for the selected model: `azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/17`.
88+
4989

5090
## Deploy the model
5191

@@ -64,7 +104,7 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
64104

65105
workspace_ml_client = MLClient(
66106
credential=InteractiveBrowserCredential,
67-
subscription_id="your subscription name goes here",
107+
subscription_id="your subscription ID goes here",
68108
resource_group_name="your resource group name goes here",
69109
workspace_name="your project name goes here",
70110
)
@@ -92,10 +132,10 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
92132
workspace_ml_client.online_endpoints.begin_create_or_update(endpoint).wait()
93133
```
94134

95-
1. Create a deployment. Replace the model ID in the next code with the model ID that you copied from the details page of the model you selected in the [Get the model ID](#get-the-model-id) section.
135+
1. Create a deployment. Replace the model ID in the next code with the model ID that you copied from the details page of the model you selected in the [Find your model in the model catalog](#find-your-model-in-the-model-catalog) section.
96136

97137
```python
98-
model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16"
138+
model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/17"
99139

100140
demo_deployment = ManagedOnlineDeployment(
101141
name="demo",
@@ -159,17 +199,26 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
159199
print(json.dumps(response_json, indent=2))
160200
```
161201

162-
## Configure Autoscaling
163202

164-
To configure autoscaling for deployments, you can go to Azure portal, locate the Azure resource typed `Machine learning online deployment` in the resource group of the AI project, and use Scaling menu under Setting. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation.
203+
::: zone-end
204+
205+
## Configure autoscaling
206+
207+
To configure autoscaling for deployments, follow these steps:
208+
209+
1. Sign in to the [Azure portal](https://portal.azure.com).
210+
1. Locate the Azure resource type `Machine learning online deployment` for the model you just deployed in the resource group of the AI project.
211+
1. Select **Settings** > **Scaling** from the left pane.
212+
1. Select **Custom autoscale** and configure autoscale settings. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation.
213+
165214

166-
## Delete the deployment endpoint
215+
## Delete the deployment
167216

168-
To delete deployments in Azure AI Foundry portal, select the **Delete** button on the top panel of the deployment details page.
217+
To delete deployments in the Azure AI Foundry portal, select **Delete deployment** on the top panel of the deployment details page.
169218

170219
## Quota considerations
171220

172-
To deploy and perform inferencing with real-time endpoints, you consume Virtual Machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Foundry, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request for a quota increase.
221+
To deploy and perform inferencing with real-time endpoints, you consume Virtual Machine (VM) core quota that Azure assigns to your subscription on a per-region basis. When you sign up for Azure AI Foundry, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request a quota increase.
173222

174223
## Related content
175224

168 KB
Loading
138 KB
Loading
-166 KB
Binary file not shown.

zone-pivots/zone-pivot-groups.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -800,7 +800,15 @@ groups:
800800
title: Azure CLI
801801
- id: programming-language-bicep
802802
title: Bicep
803-
803+
- id: azure-ai-managed-compute-deployment
804+
# Owner: mopeakande
805+
title: Programming languages
806+
prompt: Choose a tool or API
807+
pivots:
808+
- id: ai-foundry-portal
809+
title: Azure AI Foundry portal
810+
- id: python-sdk
811+
title: Python SDK
804812
- id: azure-ai-serverless-deployment
805813
# Owner: mopeakande
806814
title: Programming languages

0 commit comments

Comments
 (0)