Skip to content

Commit 7f988bd

Browse files
authored
Merge pull request #32581 from mburke5678/nodes-restrict-cpu
OSDOCS-1849: workaround to restrict CPU on pause container
2 parents 64132e0 + b017d1c commit 7f988bd

9 files changed

+80
-39
lines changed

modules/cnf-configure_for_irq_dynamic_load_balancing.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@ spec:
2424
reserved: 0-1
2525
...
2626
----
27+
+
28+
[NOTE]
29+
====
30+
When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
31+
====
2732

2833
. Create the pod that uses exclusive CPUs, and set `irq-load-balancing.crio.io` and `cpu-quota.crio.io` annotations to `disable`. For example:
2934
+

modules/cnf-cpu-infra-container.adoc

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
// Module included in the following assemblies:
2+
//
3+
// scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
4+
5+
[id="cnf-cpu-infra-container_{context}"]
6+
= Restricting CPUs for infra and application containers
7+
8+
You can reserve cores (threads) from a single NUMA node for operating system housekeeping tasks and put your workloads on another NUMA node. Partitioning the CPUs this way can prevent the housekeeping processes from impacting latency-sensitive application processes. By default, CRI-O uses all online CPUs to run infra containers on nodes, which can result in context switches and spikes in latency.
9+
10+
You can ensure that housekeeping tasks and workloads run on separate NUMA nodes by specifying two groups of CPUs in the `spec` section of the performance profile.
11+
12+
* `isolated` - The CPUs for the application container workloads. These CPUs have the lowest latency. Processes in this group have no interruptions and can, for example, reach much higher DPDK zero packet loss bandwidth.
13+
14+
* `reserved` - The CPUs for the cluster and operating system housekeeping duties, including pod infra containers. Threads in the `reserved` group tend to be very busy, so latency-sensitive applications should be run in the `isolated` group. See link:https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed[Create a pod that gets assigned a QoS class of `Guaranteed`].
15+
16+
.Procedure
17+
18+
. Create a performance profile that is appropriate for your hardware and topology.
19+
20+
. Add the `reserved` and `isolated` parameters with the CPUs you want reserved and isolated for the infra and application containers:
21+
+
22+
[source,yaml]
23+
----
24+
apiVersion: performance.openshift.io/v2
25+
kind: PerformanceProfile
26+
metadata:
27+
name: infra-cpus
28+
spec:
29+
cpu:
30+
reserved: "0-4,9" <1>
31+
isolated: "5-8" <2>
32+
nodeSelector: <3>
33+
node-role.kubernetes.io/worker: ""
34+
----
35+
<1> Specify which CPUs are for infra containers to perform cluster and operating system housekeeping duties.
36+
<2> Specify which CPUs are for application containers to run workloads.
37+
<3> Optional: Specify a node selector to apply the performance profile to specific nodes.
38+

modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
[id="managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus_{context}"]
77
= Managing device interrupt processing for guaranteed pod isolated CPUs
88

9-
The Performance Addon Operator manages host CPUs by dividing them into reserved CPUs for cluster and operating system housekeeping duties, and isolated CPUs for workloads. CPUs that are used for low latency workloads are set as isolated.
9+
The Performance Addon Operator can manage host CPUs by dividing them into reserved CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolated CPUs for application containers to run the workloads. This allows you to set CPUs for low latency workloads as isolated.
1010

1111
Device interrupts are load balanced between all isolated and reserved CPUs to avoid CPUs being overloaded, with the exception of CPUs where there is a guaranteed pod running. Guaranteed pod CPUs are prevented from processing device interrupts when the relevant annotations are set for the pod.
1212

modules/cnf-performing-end-to-end-tests-for-platform-verification.adoc

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -476,6 +476,11 @@ spec:
476476
node-role.kubernetes.io/worker-cnf: ""
477477
----
478478

479+
[NOTE]
480+
====
481+
When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
482+
====
483+
479484
To override the performance profile used, the manifest must be mounted inside the container and the tests must be instructed by setting the `PERFORMANCE_PROFILE_MANIFEST_OVERRIDE` parameter as follows:
480485

481486
[source,terminal]
@@ -586,6 +591,11 @@ spec:
586591
node-role.kubernetes.io/worker-cnf: ""
587592
----
588593

594+
[NOTE]
595+
====
596+
When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
597+
====
598+
589599
To override the performance profile, the manifest must be mounted inside the container and the tests must be instructed by setting the `PERFORMANCE_PROFILE_MANIFEST_OVERRIDE`:
590600

591601
[source,termal]

modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
Many industries and organizations need extremely high performance computing and might require low and predictable latency, especially in the financial and telecommunications industries. For these industries, with their unique requirements, {product-title} provides a Performance Addon Operator to implement automatic tuning to achieve low latency performance and consistent response time for {product-title} applications.
1010

11-
The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way. The administrator can specify whether to update the kernel to kernel-rt (real-time), the CPUs that will be reserved for housekeeping, and the CPUs that are used for running the workloads.
11+
The cluster administrator can use this performance profile configuration to make these changes in a more reliable way. The administrator can specify whether to update the kernel to kernel-rt (real-time), reserve CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolate CPUs for application containers to run the workloads.
1212

1313
[id="performance-addon-operator-known-limitations-for-real-time_{context}"]
1414
== Known limitations for real-time

modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc

Lines changed: 12 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,9 @@ The performance profile lets you control latency tuning aspects of nodes that be
1212
* A `KubeletConfig` file that configures the Topology Manager, the CPU Manager, and the {product-title} nodes.
1313
* The Tuned profile that configures the Node Tuning Operator.
1414

15-
.Procedure
15+
You can use a performance profile to specify whether to update the kernel to kernel-rt, to allocate huge pages, and to partition the CPUs for performing housekeeping duties or running workloads.
1616

17-
. Prepare a cluster.
18-
19-
. Create a machine config pool.
20-
21-
. Install the Performance Addon Operator.
22-
23-
. Create a performance profile that is appropriate for your hardware and topology. In the performance profile, you can specify whether to update the kernel to kernel-rt, allocation of huge pages, the CPUs that will be reserved for operating system housekeeping processes and CPUs that will be used for running the workloads.
24-
+
25-
This is a typical performance profile:
26-
+
17+
.Sample performance profile
2718
[source,yaml]
2819
----
2920
apiVersion: performance.openshift.io/v2
@@ -32,32 +23,24 @@ metadata:
3223
name: performance
3324
spec:
3425
cpu:
35-
isolated: "5-15"
36-
reserved: "0-4"
26+
isolated: "5-15" <1>
27+
reserved: "0-4" <2>
3728
hugepages:
3829
defaultHugepagesSize: "1G"
3930
pages:
4031
-size: "1G"
4132
count: 16
4233
node: 0
4334
realTimeKernel:
44-
enabled: true <1>
45-
numa: <2>
35+
enabled: true <3>
36+
numa: <4>
4637
topologyPolicy: "best-effort"
4738
nodeSelector:
48-
node-role.kubernetes.io/worker-cnf: ""
39+
node-role.kubernetes.io/worker-cnf: "" <5>
4940
----
41+
<1> Use this field to isolate specific CPUs to use with application containers for workloads.
42+
<2> Use this field to reserve specific CPUs to use with infra containers for housekeeping.
43+
<3> Use this field to install the real-time kernel on the node. Valid values are `true` or `false`. Setting the `true` value installs the real-time kernel.
44+
<4> Use this field to configure the topology manager policy. Valid values are `none` (default), `best-effort`, `restricted`, and `single-numa-node`. For more information, see link:https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#topology-manager-policies[Topology Manager Policies].
45+
<5> Use this field to specify a node selector to apply the performance profile to specific nodes.
5046

51-
<1> Valid values are `true` or `false`. Setting the `true` value installs the real-time kernel on the node.
52-
<2> Use this field to configure the topology manager policy. Valid values are `none` (default), `best-effort`, `restricted`, and `single-numa-node`. For more information, see link:https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#topology-manager-policies[Topology Manager Policies].
53-
54-
[id="cnf-partitioning-the-cpus_{context}"]
55-
== Partitioning the CPUs
56-
57-
You can reserve cores, or threads, for operating system housekeeping tasks from a single NUMA node and put your workloads on another NUMA node. The reason for this is that the housekeeping processes might be using the CPUs in a way that would impact latency sensitive processes running on those same CPUs. Keeping your workloads on a separate NUMA node prevents the processes from interfering with each other. Additionally, each NUMA node has its own memory bus that is not shared.
58-
59-
Specify two groups of CPUs in the `spec` section:
60-
61-
* `isolated` - Has the lowest latency. Processes in this group have no interruptions and so can, for example, reach much higher DPDK zero packet loss bandwidth.
62-
63-
* `reserved` - The housekeeping CPUs. Threads in the reserved group tend to be very busy, so latency-sensitive applications should be run in the isolated group. See link:https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed[Create a pod that gets assigned a QoS class of `Guaranteed`].

modules/cnf-understanding-low-latency.adoc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,4 @@ that, when done manually, is complex and could be prone to mistakes.
4545
tuning to achieve low latency performance for OpenShift applications.
4646
The cluster administrator uses this performance profile configuration that makes
4747
it easier to make these changes in a more reliable way. The administrator can
48-
specify whether to update the kernel to kernel-rt, the CPUs that will be
49-
reserved for housekeeping, and the CPUs that will be used for running the
50-
workloads.
48+
specify whether to update the kernel to kernel-rt, reserve CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolate CPUs for application containers to run the workloads.

modules/configuring_hyperthreading_for_a_cluster.adoc

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ $ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
6464
0-4
6565
----
6666

67-
. Apply the isolated and reserved CPUs in the `PerformanceProfile` YAML. For example, you could set logical cores CPU0 and CPU4 as isolated, and logical cores CPU1 and CPU5 as reserved:
67+
. Apply the isolated and reserved CPUs in the `PerformanceProfile` YAML. For example, you could set logical cores CPU0 and CPU4 as `isolated`, and logical cores CPU1 and CPU5 as `reserved`. When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
6868
+
6969
[source,yaml]
7070
----
@@ -87,7 +87,7 @@ When configuring clusters for low latency processing, consider whether you want
8787

8888
. Create a performance profile that is appropriate for your hardware and topology.
8989
. Set `nosmt` as an additional kernel argument. The following example performance profile illustrates this setting:
90-
90+
+
9191
[source,yaml]
9292
----
9393
apiVersion: performance.openshift.io/v2
@@ -117,3 +117,8 @@ spec:
117117
realTimeKernel:
118118
enabled: true
119119
----
120+
+
121+
[NOTE]
122+
====
123+
When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
124+
====

scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,13 @@ include::modules/cnf-configure_for_irq_dynamic_load_balancing.adoc[leveloffset=+
2727

2828
include::modules/configuring_hyperthreading_for_a_cluster.adoc[leveloffset=+2]
2929

30-
include::modules/cnf-configuring-huge-pages.adoc[leveloffset=+1]
30+
include::modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc[leveloffset=+1]
3131

32-
include::modules/cnf-allocating-multiple-huge-page-sizes.adoc[leveloffset=+1]
32+
include::modules/cnf-configuring-huge-pages.adoc[leveloffset=+2]
3333

34-
include::modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc[leveloffset=+1]
34+
include::modules/cnf-allocating-multiple-huge-page-sizes.adoc[leveloffset=+2]
35+
36+
include::modules/cnf-cpu-infra-container.adoc[leveloffset=+2]
3537

3638
include::modules/cnf-reducing-netqueues-using-pao.adoc[leveloffset=+1]
3739

0 commit comments

Comments
 (0)