Skip to content

Commit f46c652

Browse files
authored
Merge pull request #268808 from kevinkrp93/Cluster_Autoscaler_profiles
Cluster autoscaler profiles
2 parents 47fb0dc + 185f56b commit f46c652

File tree

2 files changed

+40
-10
lines changed

2 files changed

+40
-10
lines changed

articles/aks/cluster-autoscaler-overview.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,16 @@ It's a common practice to enable cluster autoscaler for nodes and either the Ver
3131
* To **effectively run workloads concurrently on both Spot and Fixed node pools**, consider using [*priority expanders*](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders). This approach allows you to schedule pods based on the priority of the node pool.
3232
* Exercise caution when **assigning CPU/Memory requests on pods**. The cluster autoscaler scales up based on pending pods rather than CPU/Memory pressure on nodes.
3333
* For **clusters concurrently hosting both long-running workloads, like web apps, and short/bursty job workloads**, we recommend separating them into distinct node pools with [Affinity Rules](./operator-best-practices-advanced-scheduler.md#node-affinity)/[expanders](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders) or using [PriorityClass](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass) to help prevent unnecessary node drain or scale down operations.
34-
* We **don't recommend making direct changes to nodes in autoscaled node pools**. All nodes in the same node group should have uniform capacity, labels, and system pods running on them.
34+
* In an autoscaler-enabled node pool, scale down nodes by removing workloads, instead of manually reducing the node count. This can be problematic if the node pool is already at maximum capacity or if there are active workloads running on the nodes, potentially causing unexpected behavior by the cluster autoscaler
3535
* Nodes don't scale up if pods have a PriorityClass value below -10. Priority -10 is reserved for [overprovisioning pods](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-cluster-autoscaler). For more information, see [Using the cluster autoscaler with Pod Priority and Preemption](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption).
3636
* **Don't combine other node autoscaling mechanisms**, such as Virtual Machine Scale Set autoscalers, with the cluster autoscaler.
3737
* The cluster autoscaler **might be unable to scale down if pods can't move, such as in the following situations**:
3838
* A directly created pod not backed by a controller object, such as a Deployment or ReplicaSet.
3939
* A pod disruption budget (PDB) that's too restrictive and doesn't allow the number of pods to fall below a certain threshold.
4040
* A pod uses node selectors or anti-affinity that can't be honored if scheduled on a different node.
4141
For more information, see [What types of pods can prevent the cluster autoscaler from removing a node?](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node).
42+
>[!IMPORTANT]
43+
> **Do not make changes to individual nodes within the autoscaled node pools**. All nodes in the same node group should have uniform capacity, labels, taints and system pods running on them.
4244
4345
## Cluster autoscaler profile
4446

@@ -52,21 +54,22 @@ It's important to note that the cluster autoscaler profile settings are cluster-
5254

5355
#### Example 1: Optimizing for performance
5456

55-
For clusters that handle substantial and bursty workloads with a primary focus on performance, we recommend increasing the `scan-interval` and decreasing the `scale-down-utilization-threshold`. These settings help batch multiple scaling operations into a single call, optimizing scaling time and the utilization of compute read/write quotas. It also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing the pod scheduling efficiency.
57+
For clusters that handle substantial and bursty workloads with a primary focus on performance, we recommend increasing the `scan-interval` and decreasing the `scale-down-utilization-threshold`. These settings help batch multiple scaling operations into a single call, optimizing scaling time and the utilization of compute read/write quotas. It also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing the pod scheduling efficiency. Also increase `ok-total-unready-count`and `max-total-unready-percentage`.
5658

57-
For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true`, which effectively ignores node utilization by daemonset pods and minimizes unnecessary scale down operations.
59+
For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true`, which effectively ignores node utilization by daemonset pods and minimizes unnecessary scale down operations. See [profile for bursty workloads](./cluster-autoscaler.md#configure-cluster-autoscaler-profile-for-bursty-workloads)
5860

5961
#### Example 2: Optimizing for cost
6062

61-
If you want a cost-optimized profile, we recommend setting the following parameter configurations:
62-
63+
If you want a [cost-optimized profile](./cluster-autoscaler.md#configure-cluster-autoscaler-profile-for-aggressive-scale-down), we recommend setting the following parameter configurations:
6364
* Reduce `scale-down-unneeded-time`, which is the amount of time a node should be unneeded before it's eligible for scale down.
6465
* Reduce `scale-down-delay-after-add`, which is the amount of time to wait after a node is added before considering it for scale down.
6566
* Increase `scale-down-utilization-threshold`, which is the utilization threshold for removing nodes.
6667
* Increase `max-empty-bulk-delete`, which is the maximum number of nodes that can be deleted in a single call.
68+
* Set `skip-nodes-with-local-storage` to false.
69+
* Increase `ok-total-unready-count`and `max-total-unready-percentage`
6770

6871
## Common issues and mitigation recommendations
69-
72+
View scaling failures and scale-up not triggered events via [CLI or Portal](./cluster-autoscaler.md#retrieve-cluster-autoscaler-logs-and-status).
7073
### Not triggering scale up operations
7174

7275
| Common causes | Mitigation recommendations |

articles/aks/cluster-autoscaler.md

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,24 @@ The following table lists the available settings for the cluster autoscaler prof
195195
--cluster-autoscaler-profile scan-interval=30s
196196
```
197197
198+
### Configure cluster autoscaler profile for aggressive scale down
199+
> [!NOTE]
200+
> Scaling down aggressively is not recommended for clusters experiencing frequent scale-outs and scale-ins within short intervals, as it could potentially result in extended node provisioning times under these circumstances. Increasing `scale-down-delay-after-add` can help in these circumstances by keeping the node around longer to handle incoming workloads.
201+
202+
```azurecli-interactive
203+
az aks update \
204+
--resource-group myResourceGroup \
205+
--name myAKSCluster \
206+
--cluster-autoscaler-profile scan-interval=30s, scale-down-delay-after-add=0s,scale-down-delay-after-failure=30s,scale-down-unneeded-time=3m,scale-down-unready-time=3m,max-graceful-termination-sec=30,skip-nodes-with-local-storage=false,max-empty-bulk-delete=1000,max-total-unready-percentage=100,ok-total-unready-count=1000,max-node-provision-time=15m
207+
```
208+
### Configure cluster autoscaler profile for bursty workloads
209+
```azurecli-interactive
210+
az aks update \
211+
--resource-group "myResourceGroup" \
212+
--name myAKSCluster \
213+
--cluster-autoscaler-profile scan-interval=20s,scale-down-delay-after-add=10m,scale-down-delay-after-failure=1m,scale-down-unneeded-time=5m,scale-down-unready-time=5m,max-graceful-termination-sec=30,skip-nodes-with-local-storage=false,max-empty-bulk-delete=100,max-total-unready-percentage=100,ok-total-unready-count=1000,max-node-provision-time=15m
214+
```
215+
198216
### Reset cluster autoscaler profile to default values
199217

200218
* Reset the cluster autoscaler profile using the [`az aks update`][az-aks-update-preview] command.
@@ -206,12 +224,11 @@ The following table lists the available settings for the cluster autoscaler prof
206224
--cluster-autoscaler-profile ""
207225
```
208226
209-
## Retrieve cluster autoscaler logs and status updates
227+
## Retrieve cluster autoscaler logs and status
210228
211229
You can retrieve logs and status updates from the cluster autoscaler to help diagnose and debug autoscaler events. AKS manages the cluster autoscaler on your behalf and runs it in the managed control plane. You can enable control plane node to see the logs and operations from the cluster autoscaler.
212230
213231
### [Azure CLI](#tab/azure-cli)
214-
215232
1. Set up a rule for resource logs to push cluster autoscaler logs to Log Analytics using the [instructions here][aks-view-master-logs]. Make sure you check the box for `cluster-autoscaler` when selecting options for **Logs**.
216233
2. Select the **Log** section on your cluster.
217234
3. Enter the following example query into Log Analytics:
@@ -224,8 +241,16 @@ You can retrieve logs and status updates from the cluster autoscaler to help dia
224241
As long as there are logs to retrieve, you should see logs similar to the following logs:
225242
226243
:::image type="content" source="media/cluster-autoscaler/autoscaler-logs.png" alt-text="Screenshot of Log Analytics logs.":::
227-
228-
The cluster autoscaler also writes out the health status to a `configmap` named `cluster-autoscaler-status`. You can retrieve these logs using the following `kubectl` command:
244+
245+
4. View cluster autoscaler scale-up not triggered events on CLI
246+
```bash
247+
kubectl get events --field-selector source=cluster-autoscaler,reason=NotTriggerScaleUp
248+
```
249+
5. View cluster autoscaler warning events on CLI
250+
```bash
251+
kubectl get events --field-selector source=cluster-autoscaler,type=Warning
252+
```
253+
6. The cluster autoscaler also writes out the health status to a `configmap` named `cluster-autoscaler-status`. You can retrieve these logs using the following `kubectl` command:
229254
230255
```bash
231256
kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
@@ -244,6 +269,8 @@ You can retrieve logs and status updates from the cluster autoscaler to help dia
244269
---
245270
246271
For more information, see the [Kubernetes/autoscaler GitHub project FAQ][kubernetes-faq].
272+
## Cluster Autoscaler Metrics
273+
You can enable [control plane metrics (Preview)](./monitor-control-plane-metrics.md) to see the logs and operations from the [cluster autoscaler](./control-plane-metrics-default-list.md#minimal-ingestion-for-default-off-targets) with the [Azure Monitor managed service for Prometheus add-on](../azure-monitor/essentials/prometheus-metrics-overview.md)
247274
248275
## Next steps
249276

0 commit comments

Comments
 (0)