You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/how-to/deploy-models-managed.md
+75-26Lines changed: 75 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,51 +1,91 @@
1
1
---
2
-
title: How to deploy and inference a managed compute deployment with code
2
+
title: How to deploy and inference a managed compute deployment
3
3
titleSuffix: Azure AI Foundry
4
-
description: Learn how to deploy and inference a managed compute deployment with code.
4
+
description: Learn how to deploy large language models on managed compute in Azure AI Foundry and perform real-time inference for generative AI applications.
#CustomerIntent: As an Azure AI developer, I want to deploy and perform inference on large language models using managed compute in Azure AI Foundry so that I can make models available for real-time generative AI applications in production environments.
15
19
---
16
20
17
-
# How to deploy and inference a managed compute deployment with code
21
+
# How to deploy and infer with a managed compute deployment
18
22
19
-
The Azure AI Foundry portal [model catalog](../how-to/model-catalog-overview.md) offers over 1,600 models, and a common way to deploy these models is to use the managed compute deployment option, which is also sometimes referred to as a managed online deployment.
23
+
The Azure AI Foundry portal [model catalog](../how-to/model-catalog-overview.md) offers over 1,600 models. A common way to deploy these models is to use the managed compute deployment option. This option is also sometimes referred to as a managed online deployment.
20
24
21
-
Deployment of a large language model (LLM) makes it available for use in a website, an application, or other production environment. Deployment typically involves hosting the model on a server or in the cloud and creating an API or other interface for users to interact with the model. You can invoke the deployment for real-time inference of generative AI applications such as chat and copilot.
25
+
When you deploy a large language model (LLM), you make it available for use in a website, an application, or other production environment. Deployment typically involves hosting the model on a server or in the cloud and creating an API or other interface for users to interact with the model. You can invoke the deployment for real-time inference of generative AI applications such as chat and copilot.
22
26
23
-
In this article, you learn how to deploy models using the Azure Machine Learning SDK. The article also covers how to perform inference on the deployed model.
27
+
In this article, you learn to deploy models with the managed compute deployment option and to perform inference on the deployed model.
24
28
25
29
## Prerequisites
26
30
27
-
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
31
+
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions don't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
28
32
29
33
- If you don't have one, [create a [!INCLUDE [hub-project-name](../includes/hub-project-name.md)]](create-projects.md?pivots=hub-project).
30
34
31
-
- Marketplace purchases enabled for your Azure subscription. Learn more [here](/azure/cost-management-billing/manage/enable-marketplace-purchases).
35
+
- Foundry [Models from Partners and Community](../model-inference/concepts/models.md#models-from-partners-and-community) require access to Azure Marketplace, while Foundry [Models Sold Directly by Azure](../model-inference/concepts/models.md#models-sold-directly-by-azure) don't have this requirement. Ensure your Azure subscription has the permissions required to subscribe to model offerings in Azure Marketplace. See [Enable Azure Marketplace purchases](/azure/cost-management-billing/manage/enable-marketplace-purchases) to learn more.
36
+
37
+
- Azure role-based access controls (Azure RBAC) grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Foundry portal](../concepts/rbac-azure-ai-foundry.md).
:::image type="content" source="../media/deploy-models-managed/catalog-filter-managed-compute.png" alt-text="A screenshot of the model catalog showing how to filter for models that can be deployed via managed compute." lightbox="../media/deploy-models-managed/catalog-filter-managed-compute.png":::
49
+
50
+
1. Select a model to open its model card. In this article, use the model `deepset-roberta-base-squad2`.
51
+
52
+
53
+
::: zone pivot="ai-foundry-portal"
32
54
33
-
## Get the model ID
55
+
## Deploy the model
56
+
57
+
1. From the model's page, select **Use this model** to open the deployment window.
58
+
1. The deployment window is pre-filled with some selections and parameter values. You can either keep them or change them as desired. You can also select an existing endpoint for the deployment or create a new one. For this example, specify an instance count of `1` and create a new endpoint for the deployment.
59
+
60
+
:::image type="content" source="../media/deploy-models-managed/deployment-configuration.png" alt-text="Screenshot of the deployment configuration screen for managed compute deployment in Azure AI Foundry." lightbox="../media/deploy-models-managed/deployment-configuration.png":::
61
+
62
+
1. Select **Deploy** to create your deployment. The creation process might take a few minutes to complete. When it's complete, the portal opens the model deployment page.
63
+
64
+
> [!TIP]
65
+
> To see endpoints deployed to your project, go to the **My assets** section of the left pane and select **Models + endpoints**.
66
+
67
+
1. The created endpoint uses key authentication for authorization. To get the keys associated with a given endpoint, follow these steps:
34
68
35
-
You can deploy managed compute models using the Azure Machine Learning SDK, but first, let's browse the model catalog and get the model ID you need for deployment.
69
+
1. Select the deployment, and note the endpoint's Target URI and Key.
70
+
1. Use these credentials to call the deployment and generate predictions.
1. Sign in to [Azure AI Foundry](https://ai.azure.com/?cid=learnDocs) and go to the **Home** page.
40
-
1. Select **Model catalog** from the left sidebar.
41
-
1. In the **Deployment options** filter, select **Managed compute**.
75
+
After you create your deployment, follow these steps to consume it:
42
76
43
-
:::image type="content" source="../media/deploy-monitor/catalog-filter-managed-compute.png" alt-text="A screenshot showing how to filter by managed compute models in the catalog." lightbox="../media/deploy-monitor/catalog-filter-managed-compute.png":::
77
+
1. Select **Models + endpoints** under the **My assets** section in your Azure AI Foundry project.
78
+
1. Select your deployment from the **Model deployments** tab.
79
+
1. Go to the **Test** tab for sample inference to the endpoint.
80
+
1. Return to the **Details** tab to copy the deployment's "Target URI", which you can use to run inference with code.
81
+
1. Go to the **Consume** tab of the deployment to find code samples for consumption.
44
82
45
-
1. Select a model.
46
-
1. Copy the model ID from the details page of the model you selected. It looks something like this: `azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16`
83
+
::: zone-end
47
84
48
85
86
+
::: zone pivot="python-sdk"
87
+
6. Copy the model ID from the details page of the model you selected. It looks like this for the selected model: `azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/17`.
88
+
49
89
50
90
## Deploy the model
51
91
@@ -64,7 +104,7 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
64
104
65
105
workspace_ml_client = MLClient(
66
106
credential=InteractiveBrowserCredential,
67
-
subscription_id="your subscription name goes here",
107
+
subscription_id="your subscription ID goes here",
68
108
resource_group_name="your resource group name goes here",
69
109
workspace_name="your project name goes here",
70
110
)
@@ -92,10 +132,10 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
1. Create a deployment. Replace the model IDin the next code with the model ID that you copied from the details page of the model you selected in the [Get the model ID](#get-the-model-id) section.
135
+
1. Create a deployment. Replace the model IDin the next code with the model ID that you copied from the details page of the model you selected in the [Find your model inthe model catalog](#find-your-model-in-the-model-catalog) section.
@@ -159,17 +199,26 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
159
199
print(json.dumps(response_json, indent=2))
160
200
```
161
201
162
-
## Configure Autoscaling
163
202
164
-
To configure autoscaling for deployments, you can go to Azure portal, locate the Azure resource typed `Machine learning online deployment`in the resource group of the AI project, and use Scaling menu under Setting. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation.
203
+
::: zone-end
204
+
205
+
## Configure autoscaling
206
+
207
+
To configure autoscaling for deployments, follow these steps:
208
+
209
+
1. Sign in to the [Azure portal](https://portal.azure.com).
210
+
1. Locate the Azure resource type`Machine learning online deployment`for the model you just deployed in the resource group of the AI project.
211
+
1. Select **Settings**>**Scaling**from the left pane.
212
+
1. Select **Custom autoscale**and configure autoscale settings. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation.
213
+
165
214
166
-
## Delete the deployment endpoint
215
+
## Delete the deployment
167
216
168
-
To delete deployments in Azure AI Foundry portal, select the **Delete** button on the top panel of the deployment details page.
217
+
To delete deployments inthe Azure AI Foundry portal, select **Delete deployment** on the top panel of the deployment details page.
169
218
170
219
## Quota considerations
171
220
172
-
To deploy and perform inferencing with real-time endpoints, you consume Virtual Machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Foundry, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can requestfor a quota increase.
221
+
To deploy and perform inferencing with real-time endpoints, you consume Virtual Machine (VM) core quota that Azure assigns to your subscription on a per-region basis. When you sign up for Azure AI Foundry, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request a quota increase.
0 commit comments