Skip to content

Commit 7b46ab4

Browse files
authored
Merge pull request #35038 from chaitanyaenr/master_failures
Recommendation around control plane sizing to handle chaotic conditions
2 parents 8a444cf + b915742 commit 7b46ab4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

modules/master-node-sizing.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ The control plane node resource requirements depend on the number of nodes in th
4040

4141
|===
4242

43-
On a cluster with three masters or control plane nodes, the CPU and memory usage will spike up when one of the nodes is stopped, rebooted or fails because the remaining two nodes must handle the load in order to be highly available. This is also expected during upgrades because the masters are cordoned, drained, and rebooted serially to apply the operating system updates, as well as the control plane Operators update. To avoid cascading failures on large and dense clusters, keep the overall resource usage on the control plane nodes (also known as the master nodes) to at most half of all available capacity to handle the resource usage spikes. Increase the CPU and memory on the control plane nodes accordingly.
43+
On a large and dense cluster with three masters or control plane nodes, the CPU and memory usage will spike up when one of the nodes is stopped, rebooted or fails. The failures can be due to unexpected issues with power, network or underlying infrastructure in addition to intentional cases where the cluster is restarted after shutting it down to save costs. The remaining two control plane (also known as master) nodes must handle the load in order to be highly available which leads to increase in the resource usage. This is also expected during upgrades because the masters are cordoned, drained, and rebooted serially to apply the operating system updates, as well as the control plane Operators update. To avoid cascading failures, keep the overall resource usage on the control plane nodes (also known as the master nodes) to at most half of all available capacity to handle the resource usage spikes. Increase the CPU and memory on the control plane nodes accordingly to avoid potential downtime due to lack of resources.
4444

4545
[IMPORTANT]
4646
====

0 commit comments

Comments
 (0)