Skip to content

Commit 0be1dd6

Browse files
author
Larry Franks
committed
incorporating feedback
1 parent d02a5b4 commit 0be1dd6

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

articles/machine-learning/how-to-deploy-azure-kubernetes-service.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,13 @@ For information on using VS Code, see [deploy to AKS via the VS Code extension](
142142

143143
### Autoscaling
144144

145+
The component that handles autoscaling for Azure ML model deployments is azureml-fe, which is a smart request router. Since all inference requests go through it, it has the necessary data to automatically scale the deployed model(s).
146+
147+
> [!IMPORTANT]
148+
> * **Do not enable Kubernetes Horizontal Pod Autoscaler (HPA) for model deployments**. Doing so would cause the two auto-scaling components to compete with each other. Azureml-fe is designed to auto-scale models deployed by Azure ML, where HPA would have to guess or approximate model utilization from a generic metric like CPU usage or a custom metric configuration.
149+
>
150+
> * **Azureml-fe does not scale the number of nodes in an AKS cluster**, because this could lead to unexpected cost increases. Instead, **it scales the number of replicas for the model** within the physical cluster boundaries. If you need to scale the number of nodes within the cluster, you can manually scale the cluster or [configure the AKS cluster autoscaler](/azure/aks/cluster-autoscaler).
151+
145152
Autoscaling can be controlled by setting `autoscale_target_utilization`, `autoscale_min_replicas`, and `autoscale_max_replicas` for the AKS web service. The following example demonstrates how to enable autoscaling:
146153

147154
```python

0 commit comments

Comments
 (0)