Skip to content

Commit 6102b9b

Browse files
authored
shared quota for online endpoint
1 parent 178a35c commit 6102b9b

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

articles/machine-learning/how-to-deploy-online-endpoints.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: inferencing
88
author: dem108
99
ms.author: sehan
1010
ms.reviewer: mopeakande
11-
ms.date: 07/17/2023
11+
ms.date: 09/18/2023
1212
reviewer: msakande
1313
ms.topic: how-to
1414
ms.custom: how-to, devplatv2, ignite-fall-2021, cliv2, event-tier1-build-2022, sdkv2
@@ -71,10 +71,11 @@ Before following the steps in this article, make sure you have the following pre
7171

7272
### Virtual machine quota allocation for deployment
7373

74-
For managed online endpoints, Azure Machine Learning reserves 20% of your compute resources for performing upgrades. Therefore, if you request a given number of instances in a deployment, you must have a quota for `ceil(1.2 * number of instances requested for deployment) * number of cores for the VM SKU` available to avoid getting an error. For example, if you request 10 instances of a [Standard_DS3_v2](/azure/virtual-machines/dv2-dsv2-series) VM (that comes with 4 cores) in a deployment, you should have a quota for 48 cores (`12 instances * 4 cores`) available. To view your usage and request quota increases, see [View your usage and quotas in the Azure portal](how-to-manage-quotas.md#view-your-usage-and-quotas-in-the-azure-portal).
74+
For managed online endpoints, Azure Machine Learning reserves 20% of your compute resources for performing upgrades on selected VM SKUs. Therefore, if you request a given number of instances in a deployment, you must have a quota for `ceil(1.2 * number of instances requested for deployment) * number of cores for the VM SKU` available to avoid getting an error. For example, if you request 10 instances of a [Standard_DS3_v2](/azure/virtual-machines/dv2-dsv2-series) VM (that comes with 4 cores) in a deployment, you should have a quota for 48 cores (`12 instances * 4 cores`) available. To view your usage and request quota increases, see [View your usage and quotas in the Azure portal](how-to-manage-quotas.md#view-your-usage-and-quotas-in-the-azure-portal).
7575

7676
<!-- In this tutorial, you'll request one instance of a Standard_DS2_v2 VM SKU (that comes with 2 cores) in your deployment; therefore, you should have a minimum quota for 4 cores (`2 instances*2 cores`) available. -->
77-
---
77+
78+
Azure Machine Learning is introducing a concept of shared quota, that can temporarily be used for a short period of time for testing Llama models on managed online endpoint. Currently this is supported only on the Studio when you deploy Llama models from the model catalog. See [shared quota](how-to-manage-quotas.md#azure-machine-learning-shared-quota) for the concept, and [How to deploy foundation models using Studio](how-to-use-foundation-models.md#deploying-using-the-studio) for how to use it.
7879

7980
## Prepare your system
8081

0 commit comments

Comments
 (0)