Skip to content

Commit 230396f

Browse files
Merge pull request #214986 from pavneeta/5k-Doc-changes
Update operator-best-practices-run-at-scale.md
2 parents e55f239 + 70d3782 commit 230396f

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

articles/aks/operator-best-practices-run-at-scale.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,16 @@ To increase the node limit beyond 1000, you must have the following pre-requisit
3636
> [!NOTE]
3737
> You can't use NPM with clusters greater than 500 Nodes
3838
39-
4039
## Node pool scaling considerations and best practices
4140

42-
* For system node pools, use the *Standard_D16ds_v5* SKU or equivalent core/memory VM SKUs to provide sufficient compute resources for *kube-system* pods.
41+
* For system node pools, use the *Standard_D16ds_v5* SKU or equivalent core/memory VM SKUs with ephemeral OS disks to provide sufficient compute resources for *kube-system* pods.
4342
* Create at-least five user node pools to scale up to 5,000 nodes since there's a 1000 nodes per node pool limit.
4443
* Use cluster autoscaler wherever possible when running at-scale AKS clusters to ensure dynamic scaling of node pools based on the demand for compute resources.
4544
* When scaling beyond 1000 nodes without cluster autoscaler, it's recommended to scale in batches of a maximum 500 to 700 nodes at a time. These scaling operations should also have 2 mins to 5-mins sleep time between consecutive scale-ups to prevent Azure API throttling.
4645

46+
> [!NOTE]
47+
> You can't use [Stop and Start feature][Stop and Start feature] on clusters enabled with the greater than 1000 node limit
48+
4749
## Cluster upgrade best practices
4850

4951
* AKS clusters have a hard limit of 5000 nodes. This limit prevents clusters from upgrading that are running at this limit since there's no more capacity do a rolling update with the max surge property. We recommend scaling the cluster down below 3000 nodes before doing cluster upgrades to provide extra capacity for node churn and minimize control plane load.
@@ -61,3 +63,4 @@ To increase the node limit beyond 1000, you must have the following pre-requisit
6163
<!-- LINKS - Internal -->
6264
[quotas-skus-regions]: quotas-skus-regions.md
6365
[cluster upgrades]: upgrade-cluster.md
66+
[Stop and Start feature]: start-stop-cluster.md

0 commit comments

Comments
 (0)