Skip to content

Commit e64dd7b

Browse files
authored
Merge pull request #54983 from ahardin-rh/perfscale-edits
OSDOCS-4320: Updating scaling guidance
2 parents 1ff6ed6 + 50c8376 commit e64dd7b

File tree

2 files changed

+44
-30
lines changed

2 files changed

+44
-30
lines changed

modules/openshift-cluster-maximums-major-releases.adoc

Lines changed: 44 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -5,89 +5,108 @@
55
[id="cluster-maximums-major-releases_{context}"]
66
= {product-title} tested cluster maximums for major releases
77

8-
Tested Cloud Platforms for {product-title} 3.x: {rh-openstack-first}, Amazon Web Services and Microsoft Azure.
9-
Tested Cloud Platforms for {product-title} 4.x: Amazon Web Services, Microsoft Azure and Google Cloud Platform.
8+
[NOTE]
9+
====
10+
Red Hat does not provide direct guidance on sizing your {product-title} cluster. This is because determining whether your cluster is within the supported bounds of {product-title} requires careful consideration of all the multidimensional factors that limit the cluster scale.
11+
====
12+
13+
{product-title} supports tested cluster maximums rather than absolute cluster maximums. Not every combination of {product-title} version, control plane workload, and network plugin are tested, so the following table does not represent an absolute expectation of scale for all deployments. It might not be possible to scale to a maximum on all dimensions simultaneously. The table contains tested maximums for specific workload and deployment configurations, and serves as a scale guide as to what can be expected with similar deployments.
1014

11-
[options="header",cols="3*"]
15+
[options="header",cols="2*"]
1216
|===
13-
| Maximum type |3.x tested maximum |4.x tested maximum
17+
| Maximum type |4.x tested maximum
1418

1519
| Number of nodes
16-
| 2,000
1720
| 2,000 ^[1]^
1821

1922
| Number of pods ^[2]^
2023
| 150,000
21-
| 150,000
2224

2325
| Number of pods per node
24-
| 250
2526
| 500 ^[3]^
2627

2728
| Number of pods per core
2829
| There is no default value.
29-
| There is no default value.
3030

3131
| Number of namespaces ^[4]^
3232
| 10,000
33-
| 10,000
3433

3534
| Number of builds
36-
| 10,000 (Default pod RAM 512 Mi) - Pipeline Strategy
3735
| 10,000 (Default pod RAM 512 Mi) - Source-to-Image (S2I) build strategy
3836

3937
| Number of pods per namespace ^[5]^
4038
| 25,000
41-
| 25,000
4239

4340
| Number of routes and back ends per Ingress Controller
4441
| 2,000 per router
45-
| 2,000 per router
4642

4743
| Number of secrets
4844
| 80,000
49-
| 80,000
5045

5146
| Number of config maps
5247
| 90,000
53-
| 90,000
5448

5549
| Number of services ^[6]^
5650
| 10,000
57-
| 10,000
5851

5952
| Number of services per namespace
6053
| 5,000
61-
| 5,000
6254

6355
| Number of back-ends per service
6456
| 5,000
65-
| 5,000
6657

6758
| Number of deployments per namespace ^[5]^
6859
| 2,000
69-
| 2,000
7060

7161
| Number of build configs
7262
| 12,000
73-
| 12,000
7463

7564
| Number of custom resource definitions (CRD)
76-
| There is no default value.
7765
| 512 ^[7]^
7866

7967
|===
8068
[.small]
8169
--
82-
1. Pause pods were deployed to stress the control plane components of {product-title} at 2000 node scale.
70+
1. Pause pods were deployed to stress the control plane components of {product-title} at 2000 node scale. The ability to scale to similar numbers will vary depending upon specific deployment and workload parameters.
8371
2. The pod count displayed here is the number of test pods. The actual number of pods depends on the application's memory, CPU, and storage requirements.
8472
3. This was tested on a cluster with 100 worker nodes with 500 pods per worker node. The default `maxPods` is still 250. To get to 500 `maxPods`, the cluster must be created with a `maxPods` set to `500` using a custom kubelet config. If you need 500 user pods, you need a `hostPrefix` of `22` because there are 10-15 system pods already running on the node. The maximum number of pods with attached persistent volume claims (PVC) depends on storage backend from where PVC are allocated. In our tests, only {rh-storage} v4 (OCS v4) was able to satisfy the number of pods per node discussed in this document.
8573
4. When there are a large number of active projects, etcd might suffer from poor performance if the keyspace grows excessively large and exceeds the space quota. Periodic maintenance of etcd, including defragmentation, is highly recommended to free etcd storage.
8674
5. There are a number of control loops in the system that must iterate over all objects in a given namespace as a reaction to some changes in state. Having a large number of objects of a given type in a single namespace can make those loops expensive and slow down processing given state changes. The limit assumes that the system has enough CPU, memory, and disk to satisfy the application requirements.
8775
6. Each service port and each service back-end has a corresponding entry in iptables. The number of back-ends of a given service impact the size of the endpoints objects, which impacts the size of data that is being sent all over the system.
8876
7. {product-title} has a limit of 512 total custom resource definitions (CRD), including those installed by {product-title}, products integrating with {product-title} and user created CRDs. If there are more than 512 CRDs created, then there is a possibility that `oc` commands requests may be throttled.
8977
--
90-
[NOTE]
91-
====
92-
Red Hat does not provide direct guidance on sizing your {product-title} cluster. This is because determining whether your cluster is within the supported bounds of {product-title} requires careful consideration of all the multidimensional factors that limit the cluster scale.
93-
====
78+
79+
[id="cluster-maximums-major-releases-example-scenario_{context}"]
80+
== Example scenario
81+
82+
As an example, 500 worker nodes (m5.2xl) were tested, and are supported, using {product-title} 4.12, the OVN-Kubernetes network plugin, and the following workload objects:
83+
84+
* 200 namespaces, in addition to the defaults
85+
* 60 pods per node; 30 server and 30 client pods (30k total)
86+
* 57 image streams/ns (11.4k total)
87+
* 15 services/ns backed by the server pods (3k total)
88+
* 15 routes/ns backed by the previous services (3k total)
89+
* 20 secrets/ns (4k total)
90+
* 10 config maps/ns (2k total)
91+
* 6 network policies/ns, including deny-all, allow-from ingress and intra-namespace rules
92+
* 57 builds/ns
93+
94+
The following factors are known to affect cluster workload scaling, positively or negatively, and should be factored into the scale numbers when planning a deployment. For additional information and guidance, contact your sales representative or link:https://access.redhat.com/support/[Red Hat support].
95+
96+
* Number of pods per node
97+
* Number of containers per pod
98+
* Type of probes used (for example, liveness/readiness, exec/http)
99+
* Number of network policies
100+
* Number of projects, or namespaces
101+
* Number of image streams per project
102+
* Number of builds per project
103+
* Number of services/endpoints and type
104+
* Number of routes
105+
* Number of shards
106+
* Number of secrets
107+
* Number of config maps
108+
* Rate of API calls, or the cluster “churn”, which is an estimation of how quickly things change in the cluster configuration.
109+
** Prometheus query for pod creation requests per second over 5 minute windows: `sum(irate(apiserver_request_count{resource="pods",verb="POST"}[5m]))`
110+
** Prometheus query for all API requests per second over 5 minute windows: `sum(irate(apiserver_request_count{}[5m]))`
111+
* Cluster node resource consumption of CPU
112+
* Cluster node resource consumption of memory

scalability_and_performance/planning-your-environment-according-to-object-maximums.adoc

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,6 @@ Consider the following tested object maximums when you plan your {product-title}
1010

1111
These guidelines are based on the largest possible cluster. For smaller clusters, the maximums are lower. There are many factors that influence the stated thresholds, including the etcd version or storage data format.
1212

13-
[IMPORTANT]
14-
====
15-
These guidelines apply to {product-title} with software-defined networking (SDN), not Open Virtual Network (OVN).
16-
====
17-
1813
In most cases, exceeding these numbers results in lower overall performance. It does not necessarily mean that the cluster will fail.
1914

2015
include::modules/openshift-cluster-maximums-major-releases.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)