Merge pull request #188329 from zhiyong-gayang/master

PRMerger7 · web-flow · commit 9eb6feaa3bbd · 2022-02-15T05:42:20.000-08:00
Update azureml aks self-scaler
diff --git a/articles/machine-learning/how-to-deploy-azure-kubernetes-service.md b/articles/machine-learning/how-to-deploy-azure-kubernetes-service.md
@@ -83,12 +83,14 @@ In Azure Machine Learning, "deployment" is used in the more general sense of mak
 The front-end component (azureml-fe) that routes incoming inference requests to deployed services automatically scales as needed. Scaling of azureml-fe is based on the AKS cluster purpose and size (number of nodes). The cluster purpose and nodes are configured when you [create or attach an AKS cluster](how-to-create-attach-kubernetes.md). There is one azureml-fe service per cluster, which may be running on multiple pods.
 
 > [!IMPORTANT]
-> When using a cluster configured as __dev-test__, the self-scaler is **disabled**.
+> When using a cluster configured as __dev-test__, the self-scaler is **disabled**. Even for FastProd/DenseProd clusters, Self-Scaler is only enabled when telemetry shows that it's needed.
 
 Azureml-fe scales both up (vertically) to use more cores, and out (horizontally) to use more pods. When making the decision to scale up, the time that it takes to route incoming inference requests is used. If this time exceeds the threshold, a scale-up occurs. If the time to route incoming requests continues to exceed the threshold, a scale-out occurs.
 
 When scaling down and in, CPU usage is used. If the CPU usage threshold is met, the front end will first be scaled down. If the CPU usage drops to the scale-in threshold, a scale-in operation happens. Scaling up and out will only occur if there are enough cluster resources available.
 
+When scale-up or scale-down, azureml-fe pods will be restarted to apply the cpu/memory changes. Inferencing requests are not affected by the restarts.
+
 <a id="connectivity"></a>
 
 ## Understand connectivity requirements for AKS inferencing cluster
@@ -105,6 +107,8 @@ The following diagram shows the connectivity requirements for AKS inferencing. B
 
 For general AKS connectivity requirements, see [Control egress traffic for cluster nodes in Azure Kubernetes Service](../aks/limit-egress-traffic.md).
 
+For accessing Azure ML services behind a firewall, see [How to access azureml behind firewall](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/machine-learning/how-to-access-azureml-behind-firewall.md).
+
 ### Overall DNS resolution requirements
 
 DNS resolution within an existing VNet is under your control. For example, a firewall or custom DNS server. The following hosts must be reachable:
@@ -117,6 +121,7 @@ DNS resolution within an existing VNet is under your control. For example, a fir
 | `<account>.table.core.windows.net` | Azure Storage Account (table storage) |
 | `<account>.blob.core.windows.net` | Azure Storage Account (blob storage) |
 | `api.azureml.ms` | Azure Active Directory (AAD) authentication |
+| `ingest-vienna<region>.kusto.windows.net` | Kusto endpoint for uploading telemetry |
 | `<leaf-domain-label + auto-generated suffix>.<region>.cloudapp.azure.com` | Endpoint domain name, if you autogenerated by Azure Machine Learning. If you used a custom domain name, you do not need this entry. |
 
 ### Connectivity requirements in chronological order: from cluster creation to model deployment