Skip to content

Commit 971f605

Browse files
Merge pull request #260649 from msakande/refactor-online-deployment-article
refactoring online deployment how-to article
2 parents 11e0516 + ce7f7ae commit 971f605

18 files changed

+646
-483
lines changed

articles/machine-learning/concept-endpoints-online.md

Lines changed: 168 additions & 13 deletions
Large diffs are not rendered by default.

articles/machine-learning/how-to-deploy-mlflow-models-online-endpoints.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -579,7 +579,7 @@ The response will be similar to the following text:
579579
```
580580
581581
> [!IMPORTANT]
582-
> For MLflow no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-local-endpoints)** is currently not supported.
582+
> For MLflow no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-a-local-endpoint)** is currently not supported.
583583
584584
585585
## Customize MLflow model deployments

articles/machine-learning/how-to-deploy-online-endpoints.md

Lines changed: 445 additions & 459 deletions
Large diffs are not rendered by default.

articles/machine-learning/how-to-deploy-with-triton.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ cd azureml-examples/sdk/python/endpoints/online/triton/single-model/
117117
This section shows how you can deploy to a managed online endpoint using the Azure CLI with the Machine Learning extension (v2).
118118

119119
> [!IMPORTANT]
120-
> For Triton no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-local-endpoints)** is currently not supported.
120+
> For Triton no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-a-local-endpoint)** is currently not supported.
121121
122122
1. To avoid typing in a path for multiple commands, use the following command to set a `BASE_PATH` environment variable. This variable points to the directory where the model and associated YAML configuration files are located:
123123

@@ -151,7 +151,7 @@ This section shows how you can deploy to a managed online endpoint using the Azu
151151
This section shows how you can define a Triton deployment to deploy to a managed online endpoint using the Azure Machine Learning Python SDK (v2).
152152
153153
> [!IMPORTANT]
154-
> For Triton no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-local-endpoints)** is currently not supported.
154+
> For Triton no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-a-local-endpoint)** is currently not supported.
155155
156156
157157
1. To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name.

articles/machine-learning/how-to-manage-quotas.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -108,9 +108,13 @@ The following table shows more limits in the platform. Reach out to the Azure Ma
108108
<sup>2</sup> Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you implement checkpoints in your job.
109109

110110
### Azure Machine Learning shared quota
111-
Azure Machine Learning provides a pool of shared quota that is available for different users across various regions to use concurrently. Depending upon availability, users can temporarily access quota from the shared pool, and use the quota to perform testing for a limited amount of time. The specific time duration depends on the use case. By temporarily using quota from the quota pool, you no longer need to file a support ticket for a short-term quota increase or wait for your quota request to be approved before you can proceed with your workload.
112111

113-
Use of the shared quota pool is available for running Spark jobs and for testing inferencing for Llama-2, Phi, Nemotron, Mistral, Dolly and Deci-DeciLM models from the Model Catalog. You should use the shared quota only for creating temporary test endpoints, not production endpoints. For endpoints in production, you should request dedicated quota by [filing a support ticket](https://ml.azure.com/quota). Billing for shared quota is usage-based, just like billing for dedicated virtual machine families. To opt out of shared quota for Spark jobs, please fill out [this](https://forms.office.com/r/n2DFPMeZYW) form.
112+
Azure Machine Learning provides a shared quota pool from which users across various regions can access quota to perform testing for a limited amount of time, depending upon availability. The specific time duration depends on the use case. By temporarily using quota from the quota pool, you no longer need to file a support ticket for a short-term quota increase or wait for your quota request to be approved before you can proceed with your workload.
113+
114+
Use of the shared quota pool is available for running Spark jobs and for testing inferencing for Llama-2, Phi, Nemotron, Mistral, Dolly, and Deci-DeciLM models from the Model Catalog for a short time. Before you can deploy these models via the shared quota, you must have an [Enterprise Agreement subscription](../cost-management-billing/manage/create-enterprise-subscription.md). For more information on how to use the shared quota for online endpoint deployment, see [How to deploy foundation models using the studio](how-to-use-foundation-models.md#deploying-using-the-studio).
115+
116+
You should use the shared quota only for creating temporary test endpoints, not production endpoints. For endpoints in production, you should request dedicated quota by [filing a support ticket](https://ml.azure.com/quota). Billing for shared quota is usage-based, just like billing for dedicated virtual machine families. To opt out of shared quota for Spark jobs, fill out the [Azure Machine Learning shared capacity allocation opt out form](https://forms.office.com/r/n2DFPMeZYW).
117+
114118

115119
### Azure Machine Learning online endpoints and batch endpoints
116120

@@ -140,16 +144,20 @@ To request an exception from the Azure Machine Learning product team, use the st
140144
| Total connections active at endpoint level for all deployments | 500 <sup>5</sup> | Yes | Managed online endpoint |
141145
| Total bandwidth at endpoint level for all deployments | 5 MBPS <sup>5</sup> | Yes | Managed online endpoint |
142146

143-
144147
<sup>1</sup> This is a regional limit. For example, if current limit on number of endpoint is 100, you can create 100 endpoints in the East US region, 100 endpoints in the West US region, and 100 endpoints in each of the other supported regions in a single subscription. Same principle applies to all the other limits.
145148

146149
<sup>2</sup> Single dashes like, `my-endpoint-name`, are accepted in endpoint and deployment names.
147150

148151
<sup>3</sup> Endpoints and deployments can be of different types, but limits apply to the sum of all types. For example, the sum of managed online endpoints, Kubernetes online endpoint and batch endpoint under each subscription can't exceed 100 per region by default. Similarly, the sum of managed online deployments, Kubernetes online deployments and batch deployments under each subscription can't exceed 500 per region by default.
149152

150-
<sup>4</sup> We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in a deployment, you must have a quota for 12. Otherwise, you receive an error. There are some VM SKUs that are exempt from extra quota. See [virtual machine quota allocation for deployment](how-to-deploy-online-endpoints.md#virtual-machine-quota-allocation-for-deployment) for more.
153+
<sup>4</sup> We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in a deployment, you must have a quota for 12. Otherwise, you receive an error. There are some VM SKUs that are exempt from extra quota. For more information on quota allocation, see [virtual machine quota allocation for deployment](#virtual-machine-quota-allocation-for-deployment).
154+
155+
<sup>5</sup> Requests per second, connections, bandwidth, etc. are related. If you request to increase any of these limits, ensure that you estimate/calculate other related limits together.
156+
157+
#### Virtual machine quota allocation for deployment
158+
159+
[!INCLUDE [quota-allocation-online-deployment](includes/quota-allocation-online-deployment.md)]
151160

152-
<sup>5</sup> Requests per second, connections, bandwidth etc. are related. If you request an increase for any of these limits, ensure estimating/calculating other related limits together.
153161

154162
### Azure Machine Learning pipelines
155163
[Azure Machine Learning pipelines](concept-ml-pipelines.md) have the following limits.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
services: machine-learning
3+
author: msakande
4+
ms.service: machine-learning
5+
ms.author: mopeakande
6+
ms.topic: "include"
7+
ms.date: 12/07/2023
8+
---
9+
10+
For managed online endpoints, Azure Machine Learning reserves 20% of your compute resources for performing upgrades on some VM SKUs. If you request a given number of instances for those VM SKUs in a deployment, you must have a quota for `ceil(1.2 * number of instances requested for deployment) * number of cores for the VM SKU` available to avoid getting an error. For example, if you request 10 instances of a [Standard_DS3_v2](../../virtual-machines/dv2-dsv2-series.md) VM (that comes with four cores) in a deployment, you should have a quota for 48 cores (`12 instances * 4 cores`) available. This extra quota is reserved for system-initiated operations such as OS upgrades and VM recovery, and it won't incur cost unless such operations run.
11+
12+
There are certain VM SKUs that are exempted from extra quota reservation. To view the full list, see [Managed online endpoints SKU list](../reference-managed-online-endpoints-vm-sku-list.md).
13+
14+
To view your usage and request quota increases, see [View your usage and quotas in the Azure portal](../how-to-manage-quotas.md#view-your-usage-and-quotas-in-the-azure-portal). To view your cost of running a managed online endpoint, see [View costs for a managed online endpoint](../how-to-view-online-endpoints-costs.md).
50.8 KB
Loading
53.8 KB
Loading
18.4 KB
Loading
43.7 KB
Loading

0 commit comments

Comments
 (0)