Merge pull request #21801 from ahardin-rh/cnf-stephen

ahardin-rh · web-flow · commit 2a53eb776087 · 2020-05-04T15:40:13.000-04:00
New topic files for CNF.
diff --git a/_topic_map.yml b/_topic_map.yml
@@ -1211,6 +1211,9 @@ Topics:
   File: routing-optimization
 - Name: What huge pages do and how they are consumed by apps
   File: what-huge-pages-do-and-how-they-are-consumed-by-apps
+- Name: Performance-addon operator for low latency nodes
+  File: cnf-performance-addon-operator-for-low-latency-nodes
+  Distros: openshift-webscale
 ---
 Name: Backup and restore
 Dir: backup_and_restore
diff --git a/modules/cnf-creating-the-performance-profile-object.adoc b/modules/cnf-creating-the-performance-profile-object.adoc
@@ -0,0 +1,61 @@
+// Module included in the following assemblies:
+// Epic CNF-78
+// * scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
+
+[id="cnf-creating-the-performance-profile-object_{context}"]
+= Creating the PerformanceProfile object
+
+Create the `PerformanceProfile` object using the object that is posted to the cluster.
+After you have specified your settings, the `PerformanceProfile` object is compiled into multiple objects:
+
+* A `Machine.Config` file that manipulates the nodes.
+* A `KubeletConfig` file that configures the Topology Manager, the CPU Manager, and the {product-title} nodes.
+* The Tuned profile that configures the Node Tuning Operator.
+
+.Procedure
+
+. Prepare a cluster.
+
+. Create a Machine ConfigPool.
+
+. Install the Performance Profile Operator.
+
+. Create a performance profile that is appropriate for your hardware and topology.
+In the performance profile, you can specify whether to update the kernel to kernel-rt, the CPUs that
+will be reserved for housekeeping, and CPUs that will be used for running the workloads.
++
+This is a typical performance profile:
++
+----
+apiversion: performance.openshift.io/v1alpha1
+kind: PerformanceProfile
+metadata:
+    name: <unique-name>
+spec:
+   cpu:
+       isolated: “1-3”
+       reserved: “0”
+   hugepages:
+      defaultHugepagesSize: “1Gi”
+      pages:
+size:  “1Gi”
+          count: 4
+          node: 0
+realTimeKernel:
+      enabled: true
+   numa:
+       topologyPolicy: “best-effort”
+----
+
+. Specify two groups of CPUs in the `spec` section:
++
+`isolated` - Has the lowest latency. Because processes in this group have no interruptions, there is zero packet loss.
++
+`reserved` - The housekeeping CPUs. Threads in the reserved group tend to be very busy, so latency-sensitive
+applications should be run in the isolated group.
+See link:https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed[Create a Pod that gets assigned a QoS class of `Guaranteed`].
+
+For example, you can reserve cores (threads) from a single NUMA node and put your workloads on another NUMA node.
+The reason for this is that the housekeeping CPUs may be touching caches in the CPU.
+Keeping your workloads on a separate NUMA node prevents the nodes from interfering with each other.
+Additionally, each NUMA node has its own memory bus that is not shared.
diff --git a/modules/cnf-understanding-low-latency.adoc b/modules/cnf-understanding-low-latency.adoc
@@ -0,0 +1,50 @@
+// Module included in the following assemblies:
+// Epic CNF-78
+// * scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
+
+[id="cnf-understanding-low-latency_{context}"]
+= Understanding low latency
+
+The emergence of Edge computing in the area of Telco / 5G plays a key role in
+reducing latency and congestion problems and improving application performance.
+
+Simply put, latency determines how fast data (packets) moves from the sender to
+receiver and returns to the sender after processing by the receiver. Obviously,
+maintaining a network architecture with the lowest possible delay of latency
+speeds is key for meeting the network performance requirements of 5G. Compared
+to 4G technology, with an average latency of 50ms, 5G is targeted to reach
+latency numbers of 1ms or less. This reduction in latency boosts wireless
+throughput by a factor of 10.
+
+Many of the deployed applications in the Telco space require low latency that
+can only tolerate zero packet loss. Tuning for zero packet loss helps mitigate
+the inherent issues that degrade network performance. For more information, see
+link:https://www.redhat.com/en/blog/tuning-zero-packet-loss-red-hat-openstack-platform-part-1[Tuning
+for Zero Packet Loss in Red Hat OpenStack Platform].
+
+The Edge computing initiative also comes in to play for reducing latency rates.
+Think of it as literally being on the edge of the cloud and closer to the user.
+This greatly reduces the distance between the user and distant data centers,
+resulting in reduced application response times and performance latency.
+
+Administrators must be able to manage their many Edge sites and local services
+in a centralized way so that all of the deployments can run at the lowest
+possible management cost. They also need an easy way to deploy and configure
+certain nodes of their cluster for real-time low latency and high-performance
+purposes. Low latency nodes are useful for applications such as Cloud-native
+Network Functions (CNF) and Data Plane Development Kit (DPDK).
+
+{product-title} currently provides mechanisms to tune software on an
+{product-title} cluster for real-time running and low latency (around <20
+microseconds reaction time). This includes tuning the kernel and {product-title}
+set values, installing a kernel, and reconfiguring the machine. But this method
+requires setting up four different Operators and performing many configurations
+that, when done manually, is complex and could be prone to mistakes.
+
+{product-title} 4.4 provides a performance-addon Operator to implement automatic
+tuning in order to achieve low latency performance for OpenShift applications.
+The cluster administrator uses this performance profile configuration that makes
+it easier to make these changes in a more reliable way. The administrator can
+specify whether to update the kernel to kernel-rt, the CPUs that will be
+reserved for housekeeping, and the CPUs that will be used for running the
+workloads.
diff --git a/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc b/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc
@@ -0,0 +1,22 @@
+[id="performance-addon-operator-for-low-latency-nodes"]
+= Performance-addon operator for low latency nodes
+include::modules/common-attributes.adoc[]
+:context: cnf-master
+
+toc::[]
+
+include::modules/cnf-understanding-low-latency.adoc[leveloffset=+1]
+
+include::modules/nw-sriov-installing-operator.adoc[leveloffset=+1]
+
+include::modules/configuring-huge-pages.adoc[leveloffset=+1]
+
+include::modules/cnf-creating-the-performance-profile-object.adoc[leveloffset=+1]
+
+.Additional resources
+
+* For more information about Machine Config and KubeletConfig,
+see xref:../nodes/nodes/nodes-nodes-managing.adoc#nodes-nodes-managing[Managing nodes].
+
+* For more information about the Node Tuning Operator,
+see xref:../scalability_and_performance/using-node-tuning-operator.adoc#using-node-tuning-operator[Using the Node Tuning Operator].