Skip to content

Commit f805912

Browse files
author
Tim Bannister
committed
Add advice about control plane resilience for large clusters
1 parent 0600eae commit f805912

File tree

1 file changed

+25
-10
lines changed

1 file changed

+25
-10
lines changed

content/en/docs/setup/best-practices/cluster-large.md

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,17 @@ Typically you would run one or two control plane instances per failure zone,
4545
scaling those instances vertically first and then scaling horizontally after reaching
4646
the point of falling returns to (vertical) scale.
4747

48+
You should run at least one instance per failure zone to provide fault-tolerance. Kubernetes
49+
nodes do not automatically steer traffic towards control-plane endpoints that are in the
50+
same failure zone; however, your cloud provider might have its own mechanisms to do this.
51+
52+
For example, using a managed load balancer, you configure the load balancer to send traffic
53+
that originates from the kubelet and Pods in failure zone _A_, and direct that traffic only
54+
to the control plane hosts that are also in zone _A_. If a single control-plane host or
55+
endpoint failure zone _A_ goes offline, that means that all the control-plane traffic for
56+
nodes in zone _A_ is now being sent between zones. Running multiple control plane hosts in
57+
each zone makes that outcome less likely.
58+
4859
### etcd storage
4960

5061
To improve performance of large clusters, you can store Event objects in a separate
@@ -57,15 +68,16 @@ When creating a cluster, you can (using custom tooling):
5768

5869
## Addon resources
5970

60-
To prevent memory leaks or other resource issues in cluster
61-
{{< glossary_tooltip text="addon" term_id="addons" >}} from consuming all the resources
62-
available on a node, Kubernetes sets
63-
[resource limits](/docs/concepts/configuration/manage-resources-containers/) on addon
64-
Pods to limit the amount of CPU and memory that they can consume.
71+
Kubernetes [resource limits](/docs/concepts/configuration/manage-resources-containers/)
72+
help to minimise the impact of memory leaks and other ways that pods and containers can
73+
impact on other components. These resource limits can and should apply to
74+
{{< glossary_tooltip text="addon" term_id="addons" >}} just as they apply to application
75+
workloads.
6576

66-
For example:
77+
For example, you can set CPU and memory limits for a logging component:
6778

6879
```yaml
80+
...
6981
containers:
7082
- name: fluentd-cloud-logging
7183
image: fluent/fluentd-kubernetes-daemonset:v1
@@ -75,10 +87,13 @@ For example:
7587
memory: 200Mi
7688
```
7789
78-
These limits are static and are based on data collected from addons running on
79-
small clusters. Most addons consume a lot more resources when running on large
80-
clusters. So, if a large cluster is deployed without adjusting these values, the
81-
addon(s) may continuously get killed because they keep hitting the limits.
90+
Addons' default limits are typically based on data collected from experience running
91+
each addon on small or medium Kubernetes clusters. When running on large
92+
clusters, addons often consume more of some resources than their default limits.
93+
If a large cluster is deployed without adjusting these values, the addon(s)
94+
may continuously get killed because they keep hitting the memory limit.
95+
Alternatively, the addon may run but with poor performance due to CPU time
96+
slice restrictions.
8297
8398
To avoid running into cluster addon resource issues, when creating a cluster with
8499
many nodes, consider the following:

0 commit comments

Comments
 (0)