You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/deploy-models-serverless.md
+40-20Lines changed: 40 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: Learn to deploy models as serverless APIs, using Azure AI Studio.
5
5
manager: scottpolly
6
6
ms.service: azure-ai-studio
7
7
ms.topic: how-to
8
-
ms.date: 5/21/2024
8
+
ms.date: 07/18/2024
9
9
ms.author: mopeakande
10
10
author: msakande
11
11
ms.reviewer: fasantia
@@ -86,16 +86,11 @@ In this article, you learn how to deploy a model from the model catalog as a ser
86
86
You can use any compatible web browser to [deploy ARM templates](../../azure-resource-manager/templates/deploy-portal.md) in the Microsoft Azure portal or use any of the deployment tools. This tutorial uses the [Azure CLI](/cli/azure/).
87
87
88
88
89
-
## Subscribe your project to the model offering
90
-
91
-
For models offered through the Azure Marketplace, you can deploy them to serverless API endpoints to consume their predictions. If it's your first time deploying the model in the project, you have to subscribe your project for the particular model offering from the Azure Marketplace. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending.
92
-
93
-
> [!NOTE]
94
-
> Models offered through the Azure Marketplace are available for deployment to serverless API endpoints in specific regions. Check [Model and region availability for Serverless API deployments](deploy-models-serverless-availability.md) to verify which models and regions are available. If the one you need is not listed, you can deploy to a workspace in a supported region and then [consume serverless API endpoints from a different workspace](deploy-models-serverless-connect.md).
89
+
## Find your model and model ID in the model catalog
95
90
96
91
1. Sign in to [Azure AI Studio](https://ai.azure.com).
97
92
98
-
1. Ensure your account has the **Azure AI Developer** role permissions on the resource group, or that you meet the [permissions required to subscribe to model offerings](#permissions-required-to-subscribe-to-model-offerings).
93
+
1. For models offered through the Azure Marketplace, ensure that your account has the **Azure AI Developer** role permissions on the resource group, or that you meet the [permissions required to subscribe to model offerings](#permissions-required-to-subscribe-to-model-offerings).
99
94
100
95
1. Select **Model catalog** from the left sidebar and find the model card of the model you want to deploy. In this article, you select a **Meta-Llama-3-8B-Instruct** model.
101
96
@@ -106,13 +101,25 @@ For models offered through the Azure Marketplace, you can deploy them to serverl
Models that are offered by non-Microsoft providers (for example, Llama and Mistral models) are billed through the Azure Marketplace. For such models, you need to subscribe your project to the particular model offering. For models from Microsoft (for example, Phi-3 models), you don't need to subscribe your project to the model offering, as billing is done differently. For details about billing for serverless deployment of models in the model catalog, see [Billing for serverless APIs](model-catalog-overview.md#billing).
106
+
107
+
The next section covers the steps for subscribing your project to a model offering. You can skip this section and go to [Deploy the model to a serverless API endpoint](#deploy-the-model-to-a-serverless-api-endpoint), if you're using a Microsoft model.
108
+
109
+
## Subscribe your project to the model offering
110
+
111
+
For non-Microsoft models offered through the Azure Marketplace, you can deploy them to serverless API endpoints to consume their predictions. If it's your first time deploying the model in the project, you have to subscribe your project for the particular model offering from the Azure Marketplace. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending.
112
+
113
+
> [!NOTE]
114
+
> Models offered through the Azure Marketplace are available for deployment to serverless API endpoints in specific regions. Check [Model and region availability for Serverless API deployments](deploy-models-serverless-availability.md) to verify which models and regions are available. If the one you need is not listed, you can deploy to a workspace in a supported region and then [consume serverless API endpoints from a different workspace](deploy-models-serverless-connect.md).
115
+
109
116
1. Create the model's marketplace subscription. When you create a subscription, you accept the terms and conditions associated with the model offer.
110
117
111
118
# [AI Studio](#tab/azure-ai-studio)
112
119
113
-
1. On the model's **Details** page, select **Deploy** and then select **Serverless API** to open the deployment wizard.
120
+
1. On the model's **Details** page, select **Deploy** and then select **Serverless API with Azure AI Content Safety** to open the deployment wizard.
114
121
115
-
1. Select the project in which you want to deploy your models. Notice that not all the regions are supported.
122
+
1. Select the project in which you want to deploy your models. To use the serverless API model deployment offering, your project must belong to one of the [regions that are supported for serverless deployment](deploy-models-serverless-availability.md) for the particular model.
116
123
117
124
:::image type="content" source="../media/deploy-monitor/serverless/deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model with the serverless API option." lightbox="../media/deploy-monitor/serverless/deploy-pay-as-you-go.png":::
118
125
@@ -191,7 +198,7 @@ For models offered through the Azure Marketplace, you can deploy them to serverl
191
198
}
192
199
```
193
200
194
-
1. Once you sign up the project for the particular Azure Marketplace offering, subsequent deployments of the same offering in the same project don't require subscribing again.
201
+
1. Once you subscribe the project for the particular Azure Marketplace offering, subsequent deployments of the same offering in the same project don't require subscribing again.
195
202
196
203
1. At any point, you can see the model offers to which your project is currently subscribed:
197
204
@@ -233,15 +240,21 @@ For models offered through the Azure Marketplace, you can deploy them to serverl
233
240
234
241
## Deploy the model to a serverless API endpoint
235
242
236
-
Once you've created a model's subscription, you can deploy the associated model to a serverless API endpoint. The serverless API endpoint provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription.
243
+
Once you've created a subscription for non-Microsoft model, you can deploy the associated model to a serverless API endpoint. For Microsoft models (such as Phi-3 models), you don't need to create a subscription.
244
+
245
+
The serverless API endpoint provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription.
237
246
238
-
In this article, you create an endpoint with name **meta-llama3-8b-qwerty**.
247
+
In this article, you create an endpoint with the name **meta-llama3-8b-qwerty**.
239
248
240
249
1. Create the serverless endpoint
241
250
242
251
# [AI Studio](#tab/azure-ai-studio)
243
252
244
-
1. From the previous wizard, select **Deploy** (if you've just subscribed the project to the model offer in the previous section), or select **Continue to deploy** (if your deployment wizard had the note *You already have an Azure Marketplace subscription for this project*).
253
+
1. To deploy a Microsoft model that doesn't require subscribing to a model offering:
254
+
1. Select **Deploy** and then select **Serverless API with Azure AI Content Safety** to open the deployment wizard.
255
+
1. Select the project in which you want to deploy your model. Notice that not all the regions are supported.
256
+
257
+
1. Alternatively, for a non-Microsoft model that requires a model subscription, if you've just subscribed your project to the model offer in the previous section, continue to select **Deploy**. Alternatively, select **Continue to deploy** (if your deployment wizard had the note *You already have an Azure Marketplace subscription for this project*).
245
258
246
259
:::image type="content" source="../media/deploy-monitor/serverless/deploy-pay-as-you-go-subscribed-project.png" alt-text="A screenshot showing a project that is already subscribed to the offering." lightbox="../media/deploy-monitor/serverless/deploy-pay-as-you-go-subscribed-project.png":::
247
260
@@ -418,11 +431,11 @@ In this article, you create an endpoint with name **meta-llama3-8b-qwerty**.
418
431
> [!TIP]
419
432
> If you're using prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.
420
433
421
-
## Using the serverless API endpoint
434
+
## Use the serverless API endpoint
422
435
423
436
Models deployed in Azure Machine Learning and Azure AI studio in Serverless API endpoints support the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way.
424
437
425
-
Read more about the [capabilities of this API](../reference/reference-model-inference-api.md#capabilities) and how [you can leverage it when building applications](../reference/reference-model-inference-api.md#getting-started).
438
+
Read more about the [capabilities of this API](../reference/reference-model-inference-api.md#capabilities) and how [you can use it when building applications](../reference/reference-model-inference-api.md#getting-started).
426
439
427
440
## Delete endpoints and subscriptions
428
441
@@ -495,7 +508,15 @@ az resource delete --name <resource-name>
495
508
496
509
## Cost and quota considerations for models deployed as serverless API endpoints
497
510
498
-
Models deployed as serverless API endpoints are offered through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying or fine-tuning the models.
511
+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
512
+
513
+
#### Cost for Microsoft models
514
+
515
+
You can find the pricing information on the __Pricing and terms__ tab of the deployment wizard when deploying Microsoft models (such as Phi-3 models) as serverless API endpoints.
516
+
517
+
#### Cost for non-Microsoft models
518
+
519
+
Non-Microsoft models deployed as serverless API endpoints are offered through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying or fine-tuning these models.
499
520
500
521
Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference and fine-tuning; however, multiple meters are available to track each scenario independently.
501
522
@@ -504,8 +525,6 @@ For more information on how to track costs, see [Monitor costs for models offere
504
525
:::image type="content" source="../media/deploy-monitor/serverless/costs-model-as-service-cost-details.png" alt-text="A screenshot showing different resources corresponding to different model offers and their associated meters." lightbox="../media/deploy-monitor/serverless/costs-model-as-service-cost-details.png":::
505
526
506
527
507
-
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
508
-
509
528
## Permissions required to subscribe to model offerings
510
529
511
530
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __Owner__, __Contributor__, or __Azure AI Developer__ role for the Azure subscription. Alternatively, your account can be assigned a custom role that has the following permissions:
@@ -527,6 +546,7 @@ Azure role-based access controls (Azure RBAC) are used to grant access to operat
527
546
528
547
For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md).
529
548
530
-
## Next step
549
+
## Related content
531
550
551
+
*[Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
532
552
*[Fine-tune a Meta Llama 2 model in Azure AI Studio](fine-tune-model-llama.md)
0 commit comments