Merge pull request #268808 from kevinkrp93/Cluster_Autoscaler_profiles

ttorble · web-flow · commit f46c652d6828 · 2024-03-15T07:35:02.000Z
Cluster autoscaler profiles
diff --git a/articles/aks/cluster-autoscaler-overview.md b/articles/aks/cluster-autoscaler-overview.md
@@ -31,14 +31,16 @@ It's a common practice to enable cluster autoscaler for nodes and either the Ver
 * To **effectively run workloads concurrently on both Spot and Fixed node pools**, consider using [*priority expanders*](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders). This approach allows you to schedule pods based on the priority of the node pool.
 * Exercise caution when **assigning CPU/Memory requests on pods**. The cluster autoscaler scales up based on pending pods rather than CPU/Memory pressure on nodes.
 * For **clusters concurrently hosting both long-running workloads, like web apps, and short/bursty job workloads**, we recommend separating them into distinct node pools with [Affinity Rules](./operator-best-practices-advanced-scheduler.md#node-affinity)/[expanders](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders) or using [PriorityClass](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass) to help prevent unnecessary node drain or scale down operations.
-* We **don't recommend making direct changes to nodes in autoscaled node pools**. All nodes in the same node group should have uniform capacity, labels, and system pods running on them.
+* In an autoscaler-enabled node pool, scale down nodes by removing workloads, instead of manually reducing the node count. This can be problematic if the node pool is already at maximum capacity or if there are active workloads running on the nodes, potentially causing unexpected behavior by the cluster autoscaler 
 * Nodes don't scale up if pods have a PriorityClass value below -10. Priority -10 is reserved for [overprovisioning pods](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-cluster-autoscaler). For more information, see [Using the cluster autoscaler with Pod Priority and Preemption](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption).
 * **Don't combine other node autoscaling mechanisms**, such as Virtual Machine Scale Set autoscalers, with the cluster autoscaler.
 * The cluster autoscaler **might be unable to scale down if pods can't move, such as in the following situations**:
   * A directly created pod not backed by a controller object, such as a Deployment or ReplicaSet.
   * A pod disruption budget (PDB) that's too restrictive and doesn't allow the number of pods to fall below a certain threshold.
   * A pod uses node selectors or anti-affinity that can't be honored if scheduled on a different node.
     For more information, see [What types of pods can prevent the cluster autoscaler from removing a node?](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node).
+>[!IMPORTANT]
+> **Do not make changes to individual nodes within the autoscaled node pools**. All nodes in the same node group should have uniform capacity, labels, taints and system pods running on them. 
 
 ## Cluster autoscaler profile
 
@@ -52,21 +54,22 @@ It's important to note that the cluster autoscaler profile settings are cluster-
 
 #### Example 1: Optimizing for performance
 
-For clusters that handle substantial and bursty workloads with a primary focus on performance, we recommend increasing the `scan-interval` and decreasing the `scale-down-utilization-threshold`. These settings help batch multiple scaling operations into a single call, optimizing scaling time and the utilization of compute read/write quotas. It also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing the pod scheduling efficiency.
+For clusters that handle substantial and bursty workloads with a primary focus on performance, we recommend increasing the `scan-interval` and decreasing the `scale-down-utilization-threshold`. These settings help batch multiple scaling operations into a single call, optimizing scaling time and the utilization of compute read/write quotas. It also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing the pod scheduling efficiency. Also increase `ok-total-unready-count`and `max-total-unready-percentage`. 
 
-For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true`, which effectively ignores node utilization by daemonset pods and minimizes unnecessary scale down operations.
+For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true`, which effectively ignores node utilization by daemonset pods and minimizes unnecessary scale down operations. See [profile for bursty workloads](./cluster-autoscaler.md#configure-cluster-autoscaler-profile-for-bursty-workloads)
 
 #### Example 2: Optimizing for cost
 
-If you want a cost-optimized profile, we recommend setting the following parameter configurations:
-
+If you want a [cost-optimized profile](./cluster-autoscaler.md#configure-cluster-autoscaler-profile-for-aggressive-scale-down), we recommend setting the following parameter configurations:
 * Reduce `scale-down-unneeded-time`, which is the amount of time a node should be unneeded before it's eligible for scale down.
 * Reduce `scale-down-delay-after-add`, which is the amount of time to wait after a node is added before considering it for scale down.
 * Increase `scale-down-utilization-threshold`, which is the utilization threshold for removing nodes.
 * Increase `max-empty-bulk-delete`, which is the maximum number of nodes that can be deleted in a single call.
+* Set `skip-nodes-with-local-storage` to false.
+* Increase `ok-total-unready-count`and `max-total-unready-percentage` 
 
 ## Common issues and mitigation recommendations
-
+View scaling failures and scale-up not triggered events via [CLI or Portal](./cluster-autoscaler.md#retrieve-cluster-autoscaler-logs-and-status).
 ### Not triggering scale up operations
 
 | Common causes | Mitigation recommendations |
diff --git a/articles/aks/cluster-autoscaler.md b/articles/aks/cluster-autoscaler.md
@@ -195,6 +195,24 @@ The following table lists the available settings for the cluster autoscaler prof
       --cluster-autoscaler-profile scan-interval=30s
     ```
 
+### Configure cluster autoscaler profile for aggressive scale down
+> [!NOTE]
+> Scaling down aggressively is not recommended for clusters experiencing frequent scale-outs and scale-ins within short intervals, as it could potentially result in extended node provisioning times under these circumstances. Increasing `scale-down-delay-after-add` can help in these circumstances by keeping the node around longer to handle incoming workloads.
+
+   ```azurecli-interactive
+    az aks update \
+        --resource-group myResourceGroup \
+        --name myAKSCluster \
+        --cluster-autoscaler-profile scan-interval=30s, scale-down-delay-after-add=0s,scale-down-delay-after-failure=30s,scale-down-unneeded-time=3m,scale-down-unready-time=3m,max-graceful-termination-sec=30,skip-nodes-with-local-storage=false,max-empty-bulk-delete=1000,max-total-unready-percentage=100,ok-total-unready-count=1000,max-node-provision-time=15m
+   ```
+### Configure cluster autoscaler profile for bursty workloads
+   ```azurecli-interactive
+    az aks update \   
+        --resource-group "myResourceGroup" \
+        --name myAKSCluster \ 
+        --cluster-autoscaler-profile scan-interval=20s,scale-down-delay-after-add=10m,scale-down-delay-after-failure=1m,scale-down-unneeded-time=5m,scale-down-unready-time=5m,max-graceful-termination-sec=30,skip-nodes-with-local-storage=false,max-empty-bulk-delete=100,max-total-unready-percentage=100,ok-total-unready-count=1000,max-node-provision-time=15m
+   ```
+
 ### Reset cluster autoscaler profile to default values
 
 * Reset the cluster autoscaler profile using the [`az aks update`][az-aks-update-preview] command.
@@ -206,12 +224,11 @@ The following table lists the available settings for the cluster autoscaler prof
       --cluster-autoscaler-profile ""
     ```
 
-## Retrieve cluster autoscaler logs and status updates
+## Retrieve cluster autoscaler logs and status 
 
 You can retrieve logs and status updates from the cluster autoscaler to help diagnose and debug autoscaler events. AKS manages the cluster autoscaler on your behalf and runs it in the managed control plane. You can enable control plane node to see the logs and operations from the cluster autoscaler.
 
 ### [Azure CLI](#tab/azure-cli)
-
 1. Set up a rule for resource logs to push cluster autoscaler logs to Log Analytics using the [instructions here][aks-view-master-logs]. Make sure you check the box for `cluster-autoscaler` when selecting options for **Logs**.
 2. Select the **Log** section on your cluster.
 3. Enter the following example query into Log Analytics:
@@ -224,8 +241,16 @@ You can retrieve logs and status updates from the cluster autoscaler to help dia
     As long as there are logs to retrieve, you should see logs similar to the following logs:
 
     :::image type="content" source="media/cluster-autoscaler/autoscaler-logs.png" alt-text="Screenshot of Log Analytics logs.":::
-
-    The cluster autoscaler also writes out the health status to a `configmap` named `cluster-autoscaler-status`. You can retrieve these logs using the following `kubectl` command:
+   
+4. View cluster autoscaler scale-up not triggered events on CLI 
+    ```bash
+    kubectl get events --field-selector source=cluster-autoscaler,reason=NotTriggerScaleUp
+    ```
+5. View cluster autoscaler warning events on CLI 
+    ```bash
+    kubectl get events --field-selector source=cluster-autoscaler,type=Warning
+    ```
+6. The cluster autoscaler also writes out the health status to a `configmap` named `cluster-autoscaler-status`. You can retrieve these logs using the following `kubectl` command:
 
     ```bash
     kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
@@ -244,6 +269,8 @@ You can retrieve logs and status updates from the cluster autoscaler to help dia
 ---
 
 For more information, see the [Kubernetes/autoscaler GitHub project FAQ][kubernetes-faq].
+## Cluster Autoscaler Metrics
+You can enable [control plane metrics (Preview)](./monitor-control-plane-metrics.md) to see the logs and operations from the [cluster autoscaler](./control-plane-metrics-default-list.md#minimal-ingestion-for-default-off-targets) with the [Azure Monitor managed service for Prometheus add-on](../azure-monitor/essentials/prometheus-metrics-overview.md)
 
 ## Next steps