Skip to content

Commit 56cf8f5

Browse files
authored
Merge pull request #24424 from sftim/20201007_large_cluster_guidance
Revise large cluster guidance
2 parents 0ea25ad + f805912 commit 56cf8f5

File tree

1 file changed

+85
-89
lines changed

1 file changed

+85
-89
lines changed

content/en/docs/setup/best-practices/cluster-large.md

Lines changed: 85 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -2,126 +2,122 @@
22
reviewers:
33
- davidopp
44
- lavalamp
5-
title: Building large clusters
5+
title: Considerations for large clusters
66
weight: 20
77
---
88

9-
## Support
10-
11-
At {{< param "version" >}}, Kubernetes supports clusters with up to 5000 nodes. More specifically, we support configurations that meet *all* of the following criteria:
9+
A cluster is a set of {{< glossary_tooltip text="nodes" term_id="node" >}} (physical
10+
or virtual machines) running Kubernetes agents, managed by the
11+
{{< glossary_tooltip text="control plane" term_id="control-plane" >}}.
12+
Kubernetes {{< param "version" >}} supports clusters with up to 5000 nodes. More specifically,
13+
Kubernetes is designed to accommodate configurations that meet *all* of the following criteria:
1214

15+
* No more than 100 pods per node
1316
* No more than 5000 nodes
1417
* No more than 150000 total pods
1518
* No more than 300000 total containers
16-
* No more than 100 pods per node
17-
18-
19-
## Setup
2019

21-
A cluster is a set of nodes (physical or virtual machines) running Kubernetes agents, managed by a "master" (the cluster-level control plane).
20+
You can scale your cluster by adding or removing nodes. The way you do this depends
21+
on how your cluster is deployed.
2222

23-
Normally the number of nodes in a cluster is controlled by the value `NUM_NODES` in the platform-specific `config-default.sh` file (for example, see [GCE's `config-default.sh`](https://releases.k8s.io/{{< param "githubbranch" >}}/cluster/gce/config-default.sh)).
23+
## Cloud provider resource quotas {#quota-issues}
2424

25-
Simply changing that value to something very large, however, may cause the setup script to fail for many cloud providers. A GCE deployment, for example, will run in to quota issues and fail to bring the cluster up.
26-
27-
When setting up a large Kubernetes cluster, the following issues must be considered.
28-
29-
### Quota Issues
30-
31-
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
32-
33-
* Increase the quota for things like CPU, IPs, etc.
34-
* In [GCE, for example,](https://cloud.google.com/compute/docs/resource-quotas) you'll want to increase the quota for:
25+
To avoid running into cloud provider quota issues, when creating a cluster with many nodes,
26+
consider:
27+
* Request a quota increase for cloud resources such as:
28+
* Computer instances
3529
* CPUs
36-
* VM instances
37-
* Total persistent disk reserved
30+
* Storage volumes
3831
* In-use IP addresses
39-
* Firewall Rules
40-
* Forwarding rules
41-
* Routes
42-
* Target pools
43-
* Gating the setup script so that it brings up new node VMs in smaller batches with waits in between, because some cloud providers rate limit the creation of VMs.
44-
45-
### Etcd storage
32+
* Packet filtering rule sets
33+
* Number of load balancers
34+
* Network subnets
35+
* Log streams
36+
* Gate the cluster scaling actions to brings up new nodes in batches, with a pause
37+
between batches, because some cloud providers rate limit the creation of new instances.
4638

47-
To improve performance of large clusters, we store events in a separate dedicated etcd instance.
39+
## Control plane components
4840

49-
When creating a cluster, existing salt scripts:
41+
For a large cluster, you need a control plane with sufficient compute and other
42+
resources.
5043

51-
* start and configure additional etcd instance
52-
* configure api-server to use it for storing events
53-
54-
### Size of master and master components
44+
Typically you would run one or two control plane instances per failure zone,
45+
scaling those instances vertically first and then scaling horizontally after reaching
46+
the point of falling returns to (vertical) scale.
5547

56-
On GCE/Google Kubernetes Engine, and AWS, `kube-up` automatically configures the proper VM size for your master depending on the number of nodes
57-
in your cluster. On other providers, you will need to configure it manually. For reference, the sizes we use on GCE are
48+
You should run at least one instance per failure zone to provide fault-tolerance. Kubernetes
49+
nodes do not automatically steer traffic towards control-plane endpoints that are in the
50+
same failure zone; however, your cloud provider might have its own mechanisms to do this.
5851

59-
* 1-5 nodes: n1-standard-1
60-
* 6-10 nodes: n1-standard-2
61-
* 11-100 nodes: n1-standard-4
62-
* 101-250 nodes: n1-standard-8
63-
* 251-500 nodes: n1-standard-16
64-
* more than 500 nodes: n1-standard-32
52+
For example, using a managed load balancer, you configure the load balancer to send traffic
53+
that originates from the kubelet and Pods in failure zone _A_, and direct that traffic only
54+
to the control plane hosts that are also in zone _A_. If a single control-plane host or
55+
endpoint failure zone _A_ goes offline, that means that all the control-plane traffic for
56+
nodes in zone _A_ is now being sent between zones. Running multiple control plane hosts in
57+
each zone makes that outcome less likely.
6558

66-
And the sizes we use on AWS are
59+
### etcd storage
6760

68-
* 1-5 nodes: m3.medium
69-
* 6-10 nodes: m3.large
70-
* 11-100 nodes: m3.xlarge
71-
* 101-250 nodes: m3.2xlarge
72-
* 251-500 nodes: c4.4xlarge
73-
* more than 500 nodes: c4.8xlarge
61+
To improve performance of large clusters, you can store Event objects in a separate
62+
dedicated etcd instance.
7463

75-
{{< note >}}
76-
On Google Kubernetes Engine, the size of the master node adjusts automatically based on the size of your cluster. For more information, see [this blog post](https://cloudplatform.googleblog.com/2017/11/Cutting-Cluster-Management-Fees-on-Google-Kubernetes-Engine.html).
64+
When creating a cluster, you can (using custom tooling):
7765

78-
On AWS, master node sizes are currently set at cluster startup time and do not change, even if you later scale your cluster up or down by manually removing or adding nodes or using a cluster autoscaler.
79-
{{< /note >}}
66+
* start and configure additional etcd instance
67+
* configure the {{< glossary_tooltip term_id="kube-apiserver" text="API server" >}} to use it for storing events
8068

81-
### Addon Resources
69+
## Addon resources
8270

83-
To prevent memory leaks or other resource issues in [cluster addons](https://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](https://pr.k8s.io/10653/files) and [#10778](https://pr.k8s.io/10778/files)).
71+
Kubernetes [resource limits](/docs/concepts/configuration/manage-resources-containers/)
72+
help to minimise the impact of memory leaks and other ways that pods and containers can
73+
impact on other components. These resource limits can and should apply to
74+
{{< glossary_tooltip text="addon" term_id="addons" >}} just as they apply to application
75+
workloads.
8476

85-
For example:
77+
For example, you can set CPU and memory limits for a logging component:
8678

8779
```yaml
80+
...
8881
containers:
8982
- name: fluentd-cloud-logging
90-
image: k8s.gcr.io/fluentd-gcp:1.16
83+
image: fluent/fluentd-kubernetes-daemonset:v1
9184
resources:
9285
limits:
9386
cpu: 100m
9487
memory: 200Mi
9588
```
9689
97-
Except for Heapster, these limits are static and are based on data we collected from addons running on 4-node clusters (see [#10335](https://issue.k8s.io/10335#issuecomment-117861225)). The addons consume a lot more resources when running on large deployment clusters (see [#5880](http://issue.k8s.io/5880#issuecomment-113984085)). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
98-
99-
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
100-
101-
* Scale memory and CPU limits for each of the following addons, if used, as you scale up the size of cluster (there is one replica of each handling the entire cluster so memory and CPU usage tends to grow proportionally with size/load on cluster):
102-
* [InfluxDB and Grafana](https://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-controller.yaml)
103-
* [kubedns, dnsmasq, and sidecar](https://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons/dns/kube-dns/kube-dns.yaml.in)
104-
* [Kibana](https://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons/fluentd-elasticsearch/kibana-deployment.yaml)
105-
* Scale number of replicas for the following addons, if used, along with the size of cluster (there are multiple replicas of each so increasing replicas should help handle increased load, but, since load per replica also increases slightly, also consider increasing CPU/memory limits):
106-
* [elasticsearch](https://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons/fluentd-elasticsearch/es-statefulset.yaml)
107-
* Increase memory and CPU limits slightly for each of the following addons, if used, along with the size of cluster (there is one replica per node but CPU/memory usage increases slightly along with cluster load/size as well):
108-
* [FluentD with ElasticSearch Plugin](https://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml)
109-
* [FluentD with GCP Plugin](https://releases.k8s.io/{{< param "githubbranch" >}}/cluster/addons/fluentd-gcp/fluentd-gcp-ds.yaml)
110-
111-
Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185)
112-
and [#22940](http://issue.k8s.io/22940)). If you find that Heapster is running
113-
out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
114-
115-
For directions on how to detect if addon containers are hitting resource limits, see the
116-
[Troubleshooting section of Compute Resources](/docs/concepts/configuration/manage-resources-containers/#troubleshooting).
117-
118-
### Allowing minor node failure at startup
119-
120-
For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
121-
`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
122-
Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
123-
running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
124-
with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
125-
reason for the failure, those additional nodes may join later or the cluster may remain at a size of
126-
`NUM_NODES - ALLOWED_NOTREADY_NODES`.
127-
90+
Addons' default limits are typically based on data collected from experience running
91+
each addon on small or medium Kubernetes clusters. When running on large
92+
clusters, addons often consume more of some resources than their default limits.
93+
If a large cluster is deployed without adjusting these values, the addon(s)
94+
may continuously get killed because they keep hitting the memory limit.
95+
Alternatively, the addon may run but with poor performance due to CPU time
96+
slice restrictions.
97+
98+
To avoid running into cluster addon resource issues, when creating a cluster with
99+
many nodes, consider the following:
100+
101+
* Some addons scale vertically - there is one replica of the addon for the cluster
102+
or serving a whole failure zone. For these addons, increase requests and limits
103+
as you scale out your cluster.
104+
* Many addons scale horizontally - you add capacity by running more pods - but with
105+
a very large cluster you may also need to raise CPU or memory limits slightly.
106+
The VerticalPodAutoscaler can run in _recommender_ mode to provide suggested
107+
figures for requests and limits.
108+
* Some addons run as one copy per node, controlled by a {{< glossary_tooltip text="DaemonSet"
109+
term_id="daemonset" >}}: for example, a node-level log aggregator. Similar to
110+
the case with horizontally-scaled addons, you may also need to raise CPU or memory
111+
limits slightly.
112+
113+
## {{% heading "whatsnext" %}}
114+
115+
`VerticalPodAutoscaler` is a custom resource that you can deploy into your cluster
116+
to help you manage resource requests and limits for pods.
117+
Visit [Vertical Pod Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#readme)
118+
to learn more about `VerticalPodAutoscaler` and how you can use it to scale cluster
119+
components, including cluster-critical addons.
120+
121+
The [cluster autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#readme)
122+
integrates with a number of cloud providers to help you run the right number of
123+
nodes for the level of resource demand in your cluster.

0 commit comments

Comments
 (0)