Merge pull request #32581 from mburke5678/nodes-restrict-cpu

mburke5678 · web-flow · commit 7f988bd0dfd9 · 2021-06-02T17:32:35.000-04:00
OSDOCS-1849: workaround to restrict CPU on pause container
diff --git a/modules/cnf-configure_for_irq_dynamic_load_balancing.adoc b/modules/cnf-configure_for_irq_dynamic_load_balancing.adoc
@@ -24,6 +24,11 @@ spec:
     reserved: 0-1
 ...
 ----
++
+[NOTE]
+====
+When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
+====
 
 . Create the pod that uses exclusive CPUs, and set `irq-load-balancing.crio.io` and `cpu-quota.crio.io` annotations to `disable`. For example:
 +
diff --git a/modules/cnf-cpu-infra-container.adoc b/modules/cnf-cpu-infra-container.adoc
@@ -0,0 +1,38 @@
+// Module included in the following assemblies:
+//
+// scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
+
+[id="cnf-cpu-infra-container_{context}"]
+= Restricting CPUs for infra and application containers
+
+You can reserve cores (threads) from a single NUMA node for operating system housekeeping tasks and put your workloads on another NUMA node. Partitioning the CPUs this way can prevent the housekeeping processes from impacting latency-sensitive application processes. By default, CRI-O uses all online CPUs to run infra containers on nodes, which can result in context switches and spikes in latency. 
+
+You can ensure that housekeeping tasks and workloads run on separate NUMA nodes by specifying two groups of CPUs in the `spec` section of the performance profile. 
+
+* `isolated` - The CPUs for the application container workloads. These CPUs have the lowest latency. Processes in this group have no interruptions and can, for example, reach much higher DPDK zero packet loss bandwidth.
+
+* `reserved` - The CPUs for the cluster and operating system housekeeping duties, including pod infra containers. Threads in the `reserved` group tend to be very busy, so latency-sensitive applications should be run in the `isolated` group. See link:https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed[Create a pod that gets assigned a QoS class of `Guaranteed`].
+
+.Procedure
+
+. Create a performance profile that is appropriate for your hardware and topology.
+
+. Add the `reserved` and `isolated` parameters with the CPUs you want reserved and isolated for the infra and application containers:
++
+[source,yaml]
+----
+﻿apiVersion: performance.openshift.io/v2
+kind: PerformanceProfile
+metadata:
+  name: infra-cpus
+spec:
+  cpu:
+    reserved: "0-4,9" <1>
+    isolated: "5-8" <2>
+  nodeSelector: <3>
+    node-role.kubernetes.io/worker: ""
+----
+<1> Specify which CPUs are for infra containers to perform cluster and operating system housekeeping duties.
+<2> Specify which CPUs are for application containers to run workloads. 
+<3> Optional: Specify a node selector to apply the performance profile to specific nodes.
+
diff --git a/modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc b/modules/cnf-managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus.adoc
@@ -6,7 +6,7 @@
 [id="managing-device-interrupt-processing-for-guaranteed-pod-isolated-cpus_{context}"]
 = Managing device interrupt processing for guaranteed pod isolated CPUs
 
-The Performance Addon Operator manages host CPUs by dividing them into reserved CPUs for cluster and operating system housekeeping duties, and isolated CPUs for workloads. CPUs that are used for low latency workloads are set as isolated.
+The Performance Addon Operator can manage host CPUs by dividing them into reserved CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolated CPUs for application containers to run the workloads. This allows you to set CPUs for low latency workloads as isolated.
 
 Device interrupts are load balanced between all isolated and reserved CPUs to avoid CPUs being overloaded, with the exception of CPUs where there is a guaranteed pod running. Guaranteed pod CPUs are prevented from processing device interrupts when the relevant annotations are set for the pod.
 
diff --git a/modules/cnf-performing-end-to-end-tests-for-platform-verification.adoc b/modules/cnf-performing-end-to-end-tests-for-platform-verification.adoc
@@ -476,6 +476,11 @@ spec:
     node-role.kubernetes.io/worker-cnf: ""
 ----
 
+[NOTE]
+====
+When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
+====
+
 To override the performance profile used, the manifest must be mounted inside the container and the tests must be instructed by setting the `PERFORMANCE_PROFILE_MANIFEST_OVERRIDE` parameter as follows:
 
 [source,terminal]
@@ -586,6 +591,11 @@ spec:
   node-role.kubernetes.io/worker-cnf: ""
 ----
 
+[NOTE]
+====
+When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
+====
+
 To override the performance profile, the manifest must be mounted inside the container and the tests must be instructed by setting the `PERFORMANCE_PROFILE_MANIFEST_OVERRIDE`:
 
 [source,termal]
diff --git a/modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc b/modules/cnf-provisioning-real-time-and-low-latency-workloads.adoc
@@ -8,7 +8,7 @@
 
 Many industries and organizations need extremely high performance computing and might require low and predictable latency, especially in the financial and telecommunications industries. For these industries, with their unique requirements, {product-title} provides a Performance Addon Operator to implement automatic tuning to achieve low latency performance and consistent response time for {product-title} applications.
 
-The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way. The administrator can specify whether to update the kernel to kernel-rt (real-time), the CPUs that will be reserved for housekeeping, and the CPUs that are used for running the workloads.
+The cluster administrator can use this performance profile configuration to make these changes in a more reliable way. The administrator can specify whether to update the kernel to kernel-rt (real-time), reserve CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolate CPUs for application containers to run the workloads.
 
 [id="performance-addon-operator-known-limitations-for-real-time_{context}"]
 == Known limitations for real-time
diff --git a/modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc b/modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc
@@ -12,18 +12,9 @@ The performance profile lets you control latency tuning aspects of nodes that be
 * A `KubeletConfig` file that configures the Topology Manager, the CPU Manager, and the {product-title} nodes.
 * The Tuned profile that configures the Node Tuning Operator.
 
-.Procedure
+You can use a performance profile to specify whether to update the kernel to kernel-rt, to allocate huge pages, and to partition the CPUs for performing housekeeping duties or running workloads.
 
-. Prepare a cluster.
-
-. Create a machine config pool.
-
-. Install the Performance Addon Operator.
-
-. Create a performance profile that is appropriate for your hardware and topology. In the performance profile, you can specify whether to update the kernel to kernel-rt, allocation of huge pages, the CPUs that will be reserved for operating system housekeeping processes and CPUs that will be used for running the workloads.
-+
-This is a typical performance profile:
-+
+.Sample performance profile
 [source,yaml]
 ----
 apiVersion: performance.openshift.io/v2
@@ -32,32 +23,24 @@ metadata:
  name: performance
 spec:
  cpu:
-  isolated: "5-15"
-  reserved: "0-4"
+  isolated: "5-15" <1>
+  reserved: "0-4" <2>
  hugepages:
   defaultHugepagesSize: "1G"
   pages:
   -size: "1G"
    count: 16
    node: 0
  realTimeKernel:
-  enabled: true  <1>
- numa:  <2>
+  enabled: true  <3>
+ numa:  <4>
   topologyPolicy: "best-effort"
  nodeSelector:
-  node-role.kubernetes.io/worker-cnf: ""
+  node-role.kubernetes.io/worker-cnf: "" <5>
 ----
+<1> Use this field to isolate specific CPUs to use with application containers for workloads.
+<2> Use this field to reserve specific CPUs to use with infra containers for housekeeping.
+<3> Use this field to install the real-time kernel on the node. Valid values are `true` or `false`. Setting the `true` value installs the real-time kernel.
+<4> Use this field to configure the topology manager policy. Valid values are `none` (default), `best-effort`, `restricted`, and `single-numa-node`. For more information, see link:https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#topology-manager-policies[Topology Manager Policies].
+<5> Use this field to specify a node selector to apply the performance profile to specific nodes.
 
-<1> Valid values are `true` or `false`. Setting the `true` value installs the real-time kernel on the node.
-<2> Use this field to configure the topology manager policy. Valid values are `none` (default), `best-effort`, `restricted`, and `single-numa-node`. For more information, see link:https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#topology-manager-policies[Topology Manager Policies].
-
-[id="cnf-partitioning-the-cpus_{context}"]
-== Partitioning the CPUs
-
-You can reserve cores, or threads, for operating system housekeeping tasks from a single NUMA node and put your workloads on another NUMA node. The reason for this is that the housekeeping processes might be using the CPUs in a way that would impact latency sensitive processes running on those same CPUs. Keeping your workloads on a separate NUMA node prevents the processes from interfering with each other. Additionally, each NUMA node has its own memory bus that is not shared.
-
-Specify two groups of CPUs in the `spec` section:
-
-* `isolated` - Has the lowest latency. Processes in this group have no interruptions and so can, for example, reach much higher DPDK zero packet loss bandwidth.
-
-* `reserved` - The housekeeping CPUs. Threads in the reserved group tend to be very busy, so latency-sensitive applications should be run in the isolated group. See link:https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed[Create a pod that gets assigned a QoS class of `Guaranteed`].
diff --git a/modules/cnf-understanding-low-latency.adoc b/modules/cnf-understanding-low-latency.adoc
@@ -45,6 +45,4 @@ that, when done manually, is complex and could be prone to mistakes.
 tuning to achieve low latency performance for OpenShift applications.
 The cluster administrator uses this performance profile configuration that makes
 it easier to make these changes in a more reliable way. The administrator can
-specify whether to update the kernel to kernel-rt, the CPUs that will be
-reserved for housekeeping, and the CPUs that will be used for running the
-workloads.
+specify whether to update the kernel to kernel-rt, reserve CPUs for cluster and operating system housekeeping duties, including pod infra containers, and isolate CPUs for application containers to run the workloads.
diff --git a/modules/configuring_hyperthreading_for_a_cluster.adoc b/modules/configuring_hyperthreading_for_a_cluster.adoc
@@ -64,7 +64,7 @@ $ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
 0-4
 ----
 
-. Apply the isolated and reserved CPUs in the `PerformanceProfile` YAML. For example, you could set logical cores CPU0 and CPU4 as isolated, and logical cores CPU1 and CPU5 as reserved:
+. Apply the isolated and reserved CPUs in the `PerformanceProfile` YAML. For example, you could set logical cores CPU0 and CPU4 as `isolated`, and logical cores CPU1 and CPU5 as `reserved`. When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
 +
 [source,yaml]
 ----
@@ -87,7 +87,7 @@ When configuring clusters for low latency processing, consider whether you want
 
 . Create a performance profile that is appropriate for your hardware and topology.
 . Set `nosmt` as an additional kernel argument. The following example performance profile illustrates this setting:
-
++
 [source,yaml]
 ----
 ﻿apiVersion: performance.openshift.io/v2
@@ -117,3 +117,8 @@ spec:
   realTimeKernel:
     enabled: true
 ----
++
+[NOTE]
+====
+When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.
+====
diff --git a/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc b/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
@@ -27,11 +27,13 @@ include::modules/cnf-configure_for_irq_dynamic_load_balancing.adoc[leveloffset=+
 
 include::modules/configuring_hyperthreading_for_a_cluster.adoc[leveloffset=+2]
 
-include::modules/cnf-configuring-huge-pages.adoc[leveloffset=+1]
+include::modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc[leveloffset=+1]
 
-include::modules/cnf-allocating-multiple-huge-page-sizes.adoc[leveloffset=+1]
+include::modules/cnf-configuring-huge-pages.adoc[leveloffset=+2]
 
-include::modules/cnf-tuning-nodes-for-low-latency-via-performanceprofile.adoc[leveloffset=+1]
+include::modules/cnf-allocating-multiple-huge-page-sizes.adoc[leveloffset=+2]
+
+include::modules/cnf-cpu-infra-container.adoc[leveloffset=+2]
 
 include::modules/cnf-reducing-netqueues-using-pao.adoc[leveloffset=+1]