Merge pull request #51562 from tmulquee/TELCODOCS-640

jeana-redhat · web-flow · commit faeb1f03faa7 · 2022-12-22T09:52:15.000-05:00
TELCODOCS-640: Optimizing CPU usage
diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml
@@ -2384,6 +2384,9 @@ Topics:
   Distros: openshift-origin,openshift-enterprise
 - Name: Optimizing networking
   File: optimizing-networking
+  Distros: openshift-origin,openshift-enterprise
+- Name: Optimizing CPU usage
+  File: optimizing-cpu-usage
 - Name: Managing bare metal hosts
   File: managing-bare-metal-hosts
   Distros: openshift-origin,openshift-enterprise
diff --git a/images/after-k8s-mount-propagation.png b/images/after-k8s-mount-propagation.png
diff --git a/images/before-k8s-mount-propagation.png b/images/before-k8s-mount-propagation.png
diff --git a/modules/enabling_encapsulation.adoc b/modules/enabling_encapsulation.adoc
@@ -0,0 +1,160 @@
+// Module included in the following assemblies:
+//
+// * scalability_and_performance/optimizing-cpu-usage.adoc
+:_content-type: PROCEDURE
+[id="enabling-encapsulation_{context}"]
+= Configuring mount namespace encapsulation
+
+You can configure mount namespace encapsulation so that a cluster runs with less resource overhead.
+Because mount namespace encapsulation is a Technology Preview feature, it is disabled by default, and must be manually enabled.
+
+.Prerequisites
+
+* You have installed the OpenShift CLI (`oc`).
+
+* You have logged in as a user with `cluster-admin` privileges.
+
+.Procedure
+
+. Create a file called `mount_namespace_config.yaml` with the following YAML to control whether encapsulation is enabled or disabled:
++
+[source,yaml]
+----
+  apiVersion: machineconfiguration.openshift.io/v1
+  kind: MachineConfig
+  metadata:
+    labels:
+      machineconfiguration.openshift.io/role: master
+    name: 99-kubens-master
+  spec:
+    config:
+      ignition:
+        version: 3.2.0
+      systemd:
+        units:
+        - enabled: true <1>
+          name: kubens.service
+  ---
+  apiVersion: machineconfiguration.openshift.io/v1
+  kind: MachineConfig
+  metadata:
+    labels:
+      machineconfiguration.openshift.io/role: worker
+    name: 99-kubens-worker
+  spec:
+    config:
+      ignition:
+        version: 3.2.0
+      systemd:
+        units:
+        - enabled: true <2>
+          name: kubens.service
+----
+<1> If `enabled` is set to true, encapsulation is in effect for all control plane nodes (in the role of `master`) .
+<2> If `enabled` is set to true, encapsulation is in effect for all compute nodes (in the role of `worker`).
+
+. Apply the mount namespace `MachineConfig` CR by running the following command:
++
+[source,terminal]
+----
+$ oc apply -f mount_namespace_config.yaml
+----
++
+.Example output
+[source,terminal]
+----
+machineconfig.machineconfiguration.openshift.io/99-kubens-master created
+machineconfig.machineconfiguration.openshift.io/99-kubens-worker created
+----
+
+. The `MachineConfig` CR can take up to 30 minutes to finish being applied in the cluster. You can check the status of the `MachineConfig` CR by running the following command:
++
+[source,terminal]
+----
+$ oc get mcp
+----
++
+.Example output
+[source,terminal]
+----
+NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
+master   rendered-master-03d4bc4befb0f4ed3566a2c8f7636751   False     True       False      3              0                   0                     0                      45m
+worker   rendered-worker-10577f6ab0117ed1825f8af2ac687ddf   False     True       False      3              1                   1
+----
+
+. Wait for the `MachineConfig` CR to be applied successfully across all control plane and worker nodes after running the following command:
++
+[source,terminal]
+----
+$ oc wait --for=condition=Updated mcp --all --timeout=30m
+----
++
+.Example output
+[source,terminal]
+----
+machineconfigpool.machineconfiguration.openshift.io/master condition met
+machineconfigpool.machineconfiguration.openshift.io/worker condition met
+----
+
+.Verification
+
+To establish whether or not encapsulation is enabled, run the following commands:
+
+. Open a debug shell to the cluster host.
++
+[source,terminal]
+----
+$ oc debug node/<node_name>
+----
+
+. Open a `chroot` session:
++
+[source,terminal]
+----
+sh-4.4# chroot /host
+----
+
+. Check systemd mount namespace:
++
+[source,terminal]
+----
+sh-4.4# readlink /proc/1/ns/mnt
+----
++
+.Example output
+[source,terminal]
+----
+mnt:[4026531953]
+----
+
+. Check kubelet mount namespace:
++
+[source,terminal]
+----
+sh-4.4# readlink /proc/$(pgrep kubelet)/ns/mnt
+----
++
+.Example output
+[source,terminal]
+----
+mnt:[4026531840]
+----
+
+. Check CRI-O mount namespace:
++
+[source,terminal]
+----
+sh-4.4# readlink /proc/$(pgrep crio)/ns/mnt
+----
++
+.Example output
+[source,terminal]
+----
+mnt:[4026531840]
+----
++
+These commands return the mount namespaces associated with systemd, kubelet and the container runtime. In {product-title}, the container runtime is CRI-O.
++
+Encapsulation is in effect if kubelet and CRI-O have a different mount namespace location to systemd.
+This is the case in the above example.
+Encapsulation is not in effect if all three namespaces are in the same mount namespace location.
diff --git a/modules/optimizing-by-encapsulation.adoc b/modules/optimizing-by-encapsulation.adoc
@@ -0,0 +1,41 @@
+// Module included in the following assemblies:
+//
+// * scalability_and_performance/optimizing-cpu-usage
+:_content-type: CONCEPT
+[id="optimizing-cpu-usage_{context}"]
+= Encapsulating mount namespaces
+
+Mount namespaces are used to isolate mount points so that processes in different namespaces cannot view each others' files. Encapsulation is the process of moving Kubernetes mount namespaces to an alternative location where they will not be constantly scanned by the host operating system.
+
+The host operating system uses systemd to constantly scan all mount namespaces: both the standard Linux mounts and the numerous mounts that Kubernetes uses to operate. The current implementation of kubelet and CRI-O both use the top-level namespace for all container runtime and kubelet mountpoints. However, encapsulating these container-specific mountpoints in a private namespace reduces systemd overhead with no difference in functionality. Using a separate mount namespace for both CRI-O and kubelet can encapsulate container-specific mounts from any systemd or other host OS interaction.
+
+This ability to potentially achieve major CPU optimization is now available to all {product-title} administrators. Encapsulation can also improve security by storing Kubernetes-specific mount points in a location safe from inspection by unprivileged users.
+
+The following diagrams illustrate a Kubernetes installation before and after encapsulation. Both scenarios show example containers which have mount propagation settings of bidirectional, host-to-container, and none.
+
+image::before-k8s-mount-propagation.png[Before encapsulation]
+
+Here we see systemd, host OS processes, kubelet, and the container runtime sharing a single mount namespace.
+
+* systemd, host OS processes, kubelet, and the container runtime each have access to and visibility of all mount points.
+
+* Container 1, configured with bidirectional mount propagation, can access systemd and host mounts, kubelet and CRI-O mounts. A mount originating in Container 1, such as `/run/a` is visible to systemd, host OS processes, kubelet, container runtime, and other containers with host-to-container or bidirectional mount propagation configured (as in Container 2).
+
+* Container 2, configured with host-to-container mount propagation, can access systemd and host mounts, kubelet and CRI-O mounts. A mount originating in Container 2, such as `/run/b`, is not visible to any other context.
+
+* Container 3, configured with no mount propagation, has no visibility of external mount points. A mount originating in Container 3, such as `/run/c`, is not visible to any other context.
+
+
+The following diagram illustrates the system state after encapsulation.
+
+image::after-k8s-mount-propagation.png[After encapsulation]
+
+* The main systemd process is no longer devoted to unnecessary scanning of Kubernetes-specific mount points. It only monitors systemd-specific and host mountpoints.
+
+* The host OS processes can access only the systemd and host mountpoints.
+
+* Using a separate mount namespace for both CRI-O and kubelet completely separates all container-specific mounts away from any systemd or other host OS interaction whatsoever.
+
+* The behavior of Container 1 is unchanged, except a mount it creates such as `/run/a` is no longer visible to systemd or host OS processes. It is still visible to kubelet, CRI-O, and other containers with host-to-container or bidirectional mount propagation configured (like Container 2).
+
+* The behavior of Container 2 and Container 3 is unchanged.
diff --git a/modules/running_services_with_encapsulation.adoc b/modules/running_services_with_encapsulation.adoc
@@ -0,0 +1,20 @@
+// Module included in the following assemblies:
+//
+// * scalability_and_performance/optimizing-cpu-usage.adoc
+:_content-type: PROCEDURE
+[id="running_services_with_encapsulation_{context}"]
+= Running additional services in the encapsulated namespace
+
+Any monitoring tool that relies on the ability to run in the host OS and have visibility of mountpoints created by kubelet, CRI-O, or containers themselves, must enter the container mount namespace to see these mountpoints. The `kubensenter` script that comes with {product-title} executes another command inside the Kubernetes mountpoint and can be used to adapt any existing tools.
+
+The `kubensenter` script is aware of the state of the mount encapsulation feature status, and is safe to run even if encapsulation is not enabled. In that case the script executes the provided command in the default mount namespace.
+
+For example, if a systemd service needs to run inside the new Kubernetes mount namespace, edit the service file and use the `ExecStart=` command line with `kubensenter`.
+
+[source,terminal]
+----
+[Unit]
+Description=Example service
+[Service]
+ExecStart=/usr/bin/kubensenter /path/to/original/command arg1 arg2
+----
diff --git a/modules/supporting_encapsulation.adoc b/modules/supporting_encapsulation.adoc
@@ -0,0 +1,46 @@
+// Module included in the following assemblies:
+//
+// * scalability_and_performance/optimizing-cpu-usage.adoc
+:_content-type: PROCEDURE
+[id="supporting-encapsulation_{context}"]
+= Inspecting encapsulated namespaces
+
+As an OpenShift support engineer or developer, you might want to inspect Kubernetes-specific mountpoints.
+
+.Procedure
+An administrator logged in to the host OS wishing to obtain information about the Kubernetes system for debugging or auditing purposes must be aware of the new mount namespace. A shell originating in a container, such as `oc debug` will already be inside the Kubernetes namespace. A shell originating in an SSH session will be in the default namespace, and an administrator must run the `kubensenter` script as root to view or interact with these mountpoints.
+
+The `kubensenter` script is aware of the state of the mount encapsulation feature status, and is safe to run even if encapsulation is not enabled. In that case it executes the shell or requested command in the default mount namespace.
+
+. Open a remote shell to the cluster host or `oc debug node/<node-name>`.
+
+. To run a single command inside the Kubernetes namespace, provide the command and any arguments to the `kubensenter` script.  For example, to run the `findmnt` command inside the Kubernetes namespace, issue the following command:
++
+[source,terminal]
+----
+$ sudo kubensenter findmnt
+----
++
+.Example output
+[source,terminal]
+----
+  kubensenter: Autodetect: kubens.service namespace found at /run/kubens/mnt
+  TARGET                                                                                                                                                   SOURCE                 FSTYPE     OPTIONS
+  /                                                                                                                                                        /dev/sda4[/ostree/deploy/rhcos/deploy/32074f0e8e5ec453e56f5a8a7bc9347eaa4172349ceab9c22b709d9d71a3f4b0.0]
+  |                                                                                                                                                                               xfs        rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
+                             shm                    tmpfs
+  ...
+----
+
+. To start a new interactive shell inside the Kubernetes namespace, run the `kubensenter` script without any arguments:
++
+[source,terminal]
+----
+$ sudo kubensenter
+----
++
+.Example output
+[source,terminal]
+----
+kubensenter: Autodetect: kubens.service namespace found at /run/kubens/mnt
+----
diff --git a/scalability_and_performance/optimizing-cpu-usage.adoc b/scalability_and_performance/optimizing-cpu-usage.adoc
@@ -0,0 +1,30 @@
+:_content-type: ASSEMBLY
+[id="optimizing-cpu-usage"]
+= Optimizing CPU usage with mount namespace encapsulation
+include::_attributes/common-attributes.adoc[]
+:context: optimizing-cpu-usage
+
+toc::[]
+
+You can optimize CPU usage in {product-title} clusters by using mount namespace encapsulation to provide a private namespace for kubelet and CRI-O processes. This reduces the cluster CPU resources used by systemd with no difference in functionality.
+
+:FeatureName: Mount namespace encapsulation
+include::snippets/technology-preview.adoc[]
+
+include::modules/optimizing-by-encapsulation.adoc[leveloffset=+1]
+
+include::modules/enabling_encapsulation.adoc[leveloffset=+1]
+
+include::modules/supporting_encapsulation.adoc[leveloffset=+1]
+
+include::modules/running_services_with_encapsulation.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+[id="optimizing-cpu-usage-additional-resources"]
+== Additional resources
+
+* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/monitoring_and_managing_system_status_and_performance/setting-limits-for-applications_monitoring-and-managing-system-status-and-performance#what-namespaces-are_setting-limits-for-applications[What are namespaces]
+
+* link:https://www.redhat.com/sysadmin/container-namespaces-nsenter[Manage containers in namespaces by using nsenter]
+
+* xref:../rest_api/machine_apis/machineconfig-machineconfiguration-openshift-io-v1.html[MachineConfig]