@@ -45,6 +45,17 @@ Typically you would run one or two control plane instances per failure zone,
45
45
scaling those instances vertically first and then scaling horizontally after reaching
46
46
the point of falling returns to (vertical) scale.
47
47
48
+ You should run at least one instance per failure zone to provide fault-tolerance. Kubernetes
49
+ nodes do not automatically steer traffic towards control-plane endpoints that are in the
50
+ same failure zone; however, your cloud provider might have its own mechanisms to do this.
51
+
52
+ For example, using a managed load balancer, you configure the load balancer to send traffic
53
+ that originates from the kubelet and Pods in failure zone _ A_ , and direct that traffic only
54
+ to the control plane hosts that are also in zone _ A_ . If a single control-plane host or
55
+ endpoint failure zone _ A_ goes offline, that means that all the control-plane traffic for
56
+ nodes in zone _ A_ is now being sent between zones. Running multiple control plane hosts in
57
+ each zone makes that outcome less likely.
58
+
48
59
### etcd storage
49
60
50
61
To improve performance of large clusters, you can store Event objects in a separate
@@ -57,15 +68,16 @@ When creating a cluster, you can (using custom tooling):
57
68
58
69
## Addon resources
59
70
60
- To prevent memory leaks or other resource issues in cluster
61
- {{< glossary_tooltip text="addon" term_id="addons" >}} from consuming all the resources
62
- available on a node, Kubernetes sets
63
- [ resource limits ] ( /docs/concepts/configuration/manage-resources-containers/ ) on addon
64
- Pods to limit the amount of CPU and memory that they can consume .
71
+ Kubernetes [ resource limits ] ( /docs/concepts/configuration/manage-resources-containers/ )
72
+ help to minimise the impact of memory leaks and other ways that pods and containers can
73
+ impact on other components. These resource limits can and should apply to
74
+ {{< glossary_tooltip text=" addon" term_id="addons" >}} just as they apply to application
75
+ workloads .
65
76
66
- For example:
77
+ For example, you can set CPU and memory limits for a logging component :
67
78
68
79
``` yaml
80
+ ...
69
81
containers :
70
82
- name : fluentd-cloud-logging
71
83
image : fluent/fluentd-kubernetes-daemonset:v1
@@ -75,10 +87,13 @@ For example:
75
87
memory : 200Mi
76
88
` ` `
77
89
78
- These limits are static and are based on data collected from addons running on
79
- small clusters. Most addons consume a lot more resources when running on large
80
- clusters. So, if a large cluster is deployed without adjusting these values, the
81
- addon(s) may continuously get killed because they keep hitting the limits.
90
+ Addons' default limits are typically based on data collected from experience running
91
+ each addon on small or medium Kubernetes clusters. When running on large
92
+ clusters, addons often consume more of some resources than their default limits.
93
+ If a large cluster is deployed without adjusting these values, the addon(s)
94
+ may continuously get killed because they keep hitting the memory limit.
95
+ Alternatively, the addon may run but with poor performance due to CPU time
96
+ slice restrictions.
82
97
83
98
To avoid running into cluster addon resource issues, when creating a cluster with
84
99
many nodes, consider the following:
0 commit comments