MicrosoftDocs
diff --git a/‎articles/machine-learning/concept-endpoints-online.md
Lines changed: 168 additions & 13 deletions b/‎articles/machine-learning/concept-endpoints-online.md
Lines changed: 168 additions & 13 deletions
diff --git a/‎articles/machine-learning/how-to-deploy-mlflow-models-online-endpoints.md
Lines changed: 1 addition & 1 deletion b/‎articles/machine-learning/how-to-deploy-mlflow-models-online-endpoints.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/machine-learning/how-to-deploy-online-endpoints.md
Lines changed: 445 additions & 459 deletions b/‎articles/machine-learning/how-to-deploy-online-endpoints.md
Lines changed: 445 additions & 459 deletions
diff --git a/‎articles/machine-learning/how-to-deploy-with-triton.md
Lines changed: 2 additions & 2 deletions b/‎articles/machine-learning/how-to-deploy-with-triton.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/machine-learning/how-to-manage-quotas.md
Lines changed: 13 additions & 5 deletions b/‎articles/machine-learning/how-to-manage-quotas.md
Lines changed: 13 additions & 5 deletions
diff --git a/‎articles/machine-learning/includes/quota-allocation-online-deployment.md
Lines changed: 14 additions & 0 deletions b/‎articles/machine-learning/includes/quota-allocation-online-deployment.md
Lines changed: 14 additions & 0 deletions
diff --git a/‎articles/machine-learning/media/how-to-deploy-online-endpoints/create-environment.png
50.8 KB b/‎articles/machine-learning/media/how-to-deploy-online-endpoints/create-environment.png
50.8 KB
diff --git a/‎articles/machine-learning/media/how-to-deploy-online-endpoints/customize-environment-with-conda-file.png
53.8 KB b/‎articles/machine-learning/media/how-to-deploy-online-endpoints/customize-environment-with-conda-file.png
53.8 KB
diff --git a/‎articles/machine-learning/media/how-to-deploy-online-endpoints/deploy-from-models-page.png
18.4 KB b/‎articles/machine-learning/media/how-to-deploy-online-endpoints/deploy-from-models-page.png
18.4 KB
diff --git a/‎articles/machine-learning/media/how-to-deploy-online-endpoints/deploy-with-custom-environment.png
43.7 KB b/‎articles/machine-learning/media/how-to-deploy-online-endpoints/deploy-with-custom-environment.png
43.7 KB
@@ -579,7 +579,7 @@ The response will be similar to the following text:
 ```
 
 > [!IMPORTANT]
-> For MLflow no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-local-endpoints)** is currently not supported.
+> For MLflow no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-a-local-endpoint)** is currently not supported.
 
 
 ## Customize MLflow model deployments
 
@@ -117,7 +117,7 @@ cd azureml-examples/sdk/python/endpoints/online/triton/single-model/
 This section shows how you can deploy to a managed online endpoint using the Azure CLI with the Machine Learning extension (v2).
 
 > [!IMPORTANT]
-> For Triton no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-local-endpoints)** is currently not supported.
+> For Triton no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-a-local-endpoint)** is currently not supported.
 
 1. To avoid typing in a path for multiple commands, use the following command to set a `BASE_PATH` environment variable. This variable points to the directory where the model and associated YAML configuration files are located:
 
@@ -151,7 +151,7 @@ This section shows how you can deploy to a managed online endpoint using the Azu
 This section shows how you can define a Triton deployment to deploy to a managed online endpoint using the Azure Machine Learning Python SDK (v2).
 
 > [!IMPORTANT]
-> For Triton no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-local-endpoints)** is currently not supported.
+> For Triton no-code-deployment, **[testing via local endpoints](how-to-deploy-online-endpoints.md#deploy-and-debug-locally-by-using-a-local-endpoint)** is currently not supported.
 
 
 1. To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. 
 
@@ -108,9 +108,13 @@ The following table shows more limits in the platform. Reach out to the Azure Ma
 <sup>2</sup> Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you implement checkpoints in your job.
 
 ### Azure Machine Learning shared quota
-Azure Machine Learning provides a pool of shared quota that is available for different users across various regions to use concurrently. Depending upon availability, users can temporarily access quota from the shared pool, and use the quota to perform testing for a limited amount of time. The specific time duration depends on the use case. By temporarily using quota from the quota pool, you no longer need to file a support ticket for a short-term quota increase or wait for your quota request to be approved before you can proceed with your workload. 
 
-Use of the shared quota pool is available for running Spark jobs and for testing inferencing for Llama-2, Phi, Nemotron, Mistral, Dolly and Deci-DeciLM models from the Model Catalog. You should use the shared quota only for creating temporary test endpoints, not production endpoints. For endpoints in production, you should request dedicated quota by [filing a support ticket](https://ml.azure.com/quota). Billing for shared quota is usage-based, just like billing for dedicated virtual machine families. To opt out of shared quota for Spark jobs, please fill out [this](https://forms.office.com/r/n2DFPMeZYW) form.
+Azure Machine Learning provides a shared quota pool from which users across various regions can access quota to perform testing for a limited amount of time, depending upon availability. The specific time duration depends on the use case. By temporarily using quota from the quota pool, you no longer need to file a support ticket for a short-term quota increase or wait for your quota request to be approved before you can proceed with your workload.
+
+Use of the shared quota pool is available for running Spark jobs and for testing inferencing for Llama-2, Phi, Nemotron, Mistral, Dolly, and Deci-DeciLM models from the Model Catalog for a short time. Before you can deploy these models via the shared quota, you must have an [Enterprise Agreement subscription](../cost-management-billing/manage/create-enterprise-subscription.md). For more information on how to use the shared quota for online endpoint deployment, see [How to deploy foundation models using the studio](how-to-use-foundation-models.md#deploying-using-the-studio).
+
+You should use the shared quota only for creating temporary test endpoints, not production endpoints. For endpoints in production, you should request dedicated quota by [filing a support ticket](https://ml.azure.com/quota). Billing for shared quota is usage-based, just like billing for dedicated virtual machine families. To opt out of shared quota for Spark jobs, fill out the [Azure Machine Learning shared capacity allocation opt out form](https://forms.office.com/r/n2DFPMeZYW).
+
 
 ### Azure Machine Learning online endpoints and batch endpoints
 
@@ -140,16 +144,20 @@ To request an exception from the Azure Machine Learning product team, use the st
 | Total connections active at endpoint level for all deployments  | 500 <sup>5</sup> | Yes | Managed online endpoint |
 | Total bandwidth at endpoint level for all deployments  | 5 MBPS <sup>5</sup> | Yes | Managed online endpoint |
 
-
 <sup>1</sup> This is a regional limit. For example, if current limit on number of endpoint is 100, you can create 100 endpoints in the East US region, 100 endpoints in the West US region, and 100 endpoints in each of the other supported regions in a single subscription. Same principle applies to all the other limits. 
 
 <sup>2</sup> Single dashes like, `my-endpoint-name`, are accepted in endpoint and deployment names.
 
 <sup>3</sup> Endpoints and deployments can be of different types, but limits apply to the sum of all types. For example, the sum of managed online endpoints, Kubernetes online endpoint and batch endpoint under each subscription can't exceed 100 per region by default. Similarly, the sum of managed online deployments, Kubernetes online deployments and batch deployments under each subscription can't exceed 500 per region by default.
 
-<sup>4</sup> We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in a deployment, you must have a quota for 12. Otherwise, you receive an error. There are some VM SKUs that are exempt from extra quota. See [virtual machine quota allocation for deployment](how-to-deploy-online-endpoints.md#virtual-machine-quota-allocation-for-deployment) for more.
+<sup>4</sup> We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in a deployment, you must have a quota for 12. Otherwise, you receive an error. There are some VM SKUs that are exempt from extra quota. For more information on quota allocation, see [virtual machine quota allocation for deployment](#virtual-machine-quota-allocation-for-deployment).
+
+<sup>5</sup> Requests per second, connections, bandwidth, etc. are related. If you request to increase any of these limits, ensure that you estimate/calculate other related limits together.
+
+#### Virtual machine quota allocation for deployment
+
+[!INCLUDE [quota-allocation-online-deployment](includes/quota-allocation-online-deployment.md)]
 
-<sup>5</sup> Requests per second, connections, bandwidth etc. are related. If you request an increase for any of these limits, ensure estimating/calculating other related limits together.
 
 ### Azure Machine Learning pipelines
 [Azure Machine Learning pipelines](concept-ml-pipelines.md) have the following limits.
 
@@ -0,0 +1,14 @@
+---
+services: machine-learning
+author: msakande
+ms.service: machine-learning
+ms.author: mopeakande
+ms.topic: "include"
+ms.date: 12/07/2023
+---
+
+For managed online endpoints, Azure Machine Learning reserves 20% of your compute resources for performing upgrades on some VM SKUs. If you request a given number of instances for those VM SKUs in a deployment, you must have a quota for `ceil(1.2 * number of instances requested for deployment) * number of cores for the VM SKU` available to avoid getting an error. For example, if you request 10 instances of a [Standard_DS3_v2](../../virtual-machines/dv2-dsv2-series.md) VM (that comes with four cores) in a deployment, you should have a quota for 48 cores (`12 instances * 4 cores`) available. This extra quota is reserved for system-initiated operations such as OS upgrades and VM recovery, and it won't incur cost unless such operations run.
+
+There are certain VM SKUs that are exempted from extra quota reservation. To view the full list, see [Managed online endpoints SKU list](../reference-managed-online-endpoints-vm-sku-list.md).
+
+To view your usage and request quota increases, see [View your usage and quotas in the Azure portal](../how-to-manage-quotas.md#view-your-usage-and-quotas-in-the-azure-portal). To view your cost of running a managed online endpoint, see [View costs for a managed online endpoint](../how-to-view-online-endpoints-costs.md).