Skip to content

Commit 11d5a22

Browse files
authored
Update operator-best-practices-run-at-scale.md
Adding a couple of callouts to the page 1. ephemeral OS disks for system pools 2. Stop and Start feature not available at >1k scale
1 parent eab97f8 commit 11d5a22

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

articles/aks/operator-best-practices-run-at-scale.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,14 @@ To increase the node limit beyond 1000, you must have the following pre-requisit
3939

4040
## Node pool scaling considerations and best practices
4141

42-
* For system node pools, use the *Standard_D16ds_v5* SKU or equivalent core/memory VM SKUs to provide sufficient compute resources for *kube-system* pods.
42+
* For system node pools, use the *Standard_D16ds_v5* SKU or equivalent core/memory VM SKUs with ephemeral OS disks to provide sufficient compute resources for *kube-system* pods.
4343
* Create at-least five user node pools to scale up to 5,000 nodes since there's a 1000 nodes per node pool limit.
4444
* Use cluster autoscaler wherever possible when running at-scale AKS clusters to ensure dynamic scaling of node pools based on the demand for compute resources.
4545
* When scaling beyond 1000 nodes without cluster autoscaler, it's recommended to scale in batches of a maximum 500 to 700 nodes at a time. These scaling operations should also have 2 mins to 5-mins sleep time between consecutive scale-ups to prevent Azure API throttling.
4646

47+
> [!NOTE]
48+
> You can't use [Stop and Start feature][Stop and Start feature] on clusters enabled with the greater than 1000 node limit
49+
4750
## Cluster upgrade best practices
4851

4952
* AKS clusters have a hard limit of 5000 nodes. This limit prevents clusters from upgrading that are running at this limit since there's no more capacity do a rolling update with the max surge property. We recommend scaling the cluster down below 3000 nodes before doing cluster upgrades to provide extra capacity for node churn and minimize control plane load.
@@ -61,3 +64,4 @@ To increase the node limit beyond 1000, you must have the following pre-requisit
6164
<!-- LINKS - Internal -->
6265
[quotas-skus-regions]: quotas-skus-regions.md
6366
[cluster upgrades]: upgrade-cluster.md
67+
[Stop and Start feature]: start-stop-cluster.md

0 commit comments

Comments
 (0)