incorporating feedback

Larry Franks · Larry Franks · commit 0be1dd6fae1b · 2020-09-15T08:31:11.000-04:00
diff --git a/articles/machine-learning/how-to-deploy-azure-kubernetes-service.md b/articles/machine-learning/how-to-deploy-azure-kubernetes-service.md
@@ -142,6 +142,13 @@ For information on using VS Code, see [deploy to AKS via the VS Code extension](
 
 ### Autoscaling
 
+The component that handles autoscaling for Azure ML model deployments is azureml-fe, which is a smart request router. Since all inference requests go through it, it has the necessary data to automatically scale the deployed model(s).
+
+> [!IMPORTANT]
+> * **Do not enable Kubernetes Horizontal Pod Autoscaler (HPA) for model deployments**. Doing so would cause the two auto-scaling components to compete with each other. Azureml-fe is designed to auto-scale models deployed by Azure ML, where HPA would have to guess or approximate model utilization from a generic metric like CPU usage or a custom metric configuration.
+> 
+> * **Azureml-fe does not scale the number of nodes in an AKS cluster**, because this could lead to unexpected cost increases. Instead, **it scales the number of replicas for the model** within the physical cluster boundaries. If you need to scale the number of nodes within the cluster, you can manually scale the cluster or [configure the AKS cluster autoscaler](/azure/aks/cluster-autoscaler).
+
 Autoscaling can be controlled by setting `autoscale_target_utilization`, `autoscale_min_replicas`, and `autoscale_max_replicas` for the AKS web service. The following example demonstrates how to enable autoscaling:
 
 ```python