Skip to content

Commit faeb1f0

Browse files
authored
Merge pull request #51562 from tmulquee/TELCODOCS-640
TELCODOCS-640: Optimizing CPU usage
2 parents b4eeda4 + adcd116 commit faeb1f0

8 files changed

+300
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2384,6 +2384,9 @@ Topics:
23842384
Distros: openshift-origin,openshift-enterprise
23852385
- Name: Optimizing networking
23862386
File: optimizing-networking
2387+
Distros: openshift-origin,openshift-enterprise
2388+
- Name: Optimizing CPU usage
2389+
File: optimizing-cpu-usage
23872390
- Name: Managing bare metal hosts
23882391
File: managing-bare-metal-hosts
23892392
Distros: openshift-origin,openshift-enterprise
165 KB
Loading
141 KB
Loading

modules/enabling_encapsulation.adoc

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/optimizing-cpu-usage.adoc
4+
:_content-type: PROCEDURE
5+
[id="enabling-encapsulation_{context}"]
6+
= Configuring mount namespace encapsulation
7+
8+
You can configure mount namespace encapsulation so that a cluster runs with less resource overhead.
9+
Because mount namespace encapsulation is a Technology Preview feature, it is disabled by default, and must be manually enabled.
10+
11+
.Prerequisites
12+
13+
* You have installed the OpenShift CLI (`oc`).
14+
15+
* You have logged in as a user with `cluster-admin` privileges.
16+
17+
.Procedure
18+
19+
. Create a file called `mount_namespace_config.yaml` with the following YAML to control whether encapsulation is enabled or disabled:
20+
+
21+
[source,yaml]
22+
----
23+
apiVersion: machineconfiguration.openshift.io/v1
24+
kind: MachineConfig
25+
metadata:
26+
labels:
27+
machineconfiguration.openshift.io/role: master
28+
name: 99-kubens-master
29+
spec:
30+
config:
31+
ignition:
32+
version: 3.2.0
33+
systemd:
34+
units:
35+
- enabled: true <1>
36+
name: kubens.service
37+
---
38+
apiVersion: machineconfiguration.openshift.io/v1
39+
kind: MachineConfig
40+
metadata:
41+
labels:
42+
machineconfiguration.openshift.io/role: worker
43+
name: 99-kubens-worker
44+
spec:
45+
config:
46+
ignition:
47+
version: 3.2.0
48+
systemd:
49+
units:
50+
- enabled: true <2>
51+
name: kubens.service
52+
----
53+
<1> If `enabled` is set to true, encapsulation is in effect for all control plane nodes (in the role of `master`) .
54+
<2> If `enabled` is set to true, encapsulation is in effect for all compute nodes (in the role of `worker`).
55+
56+
. Apply the mount namespace `MachineConfig` CR by running the following command:
57+
+
58+
[source,terminal]
59+
----
60+
$ oc apply -f mount_namespace_config.yaml
61+
----
62+
+
63+
.Example output
64+
[source,terminal]
65+
----
66+
machineconfig.machineconfiguration.openshift.io/99-kubens-master created
67+
machineconfig.machineconfiguration.openshift.io/99-kubens-worker created
68+
----
69+
70+
. The `MachineConfig` CR can take up to 30 minutes to finish being applied in the cluster. You can check the status of the `MachineConfig` CR by running the following command:
71+
+
72+
[source,terminal]
73+
----
74+
$ oc get mcp
75+
----
76+
+
77+
.Example output
78+
[source,terminal]
79+
----
80+
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
81+
master rendered-master-03d4bc4befb0f4ed3566a2c8f7636751 False True False 3 0 0 0 45m
82+
worker rendered-worker-10577f6ab0117ed1825f8af2ac687ddf False True False 3 1 1
83+
----
84+
85+
. Wait for the `MachineConfig` CR to be applied successfully across all control plane and worker nodes after running the following command:
86+
+
87+
[source,terminal]
88+
----
89+
$ oc wait --for=condition=Updated mcp --all --timeout=30m
90+
----
91+
+
92+
.Example output
93+
[source,terminal]
94+
----
95+
machineconfigpool.machineconfiguration.openshift.io/master condition met
96+
machineconfigpool.machineconfiguration.openshift.io/worker condition met
97+
----
98+
99+
.Verification
100+
101+
To establish whether or not encapsulation is enabled, run the following commands:
102+
103+
. Open a debug shell to the cluster host.
104+
+
105+
[source,terminal]
106+
----
107+
$ oc debug node/<node_name>
108+
----
109+
110+
. Open a `chroot` session:
111+
+
112+
[source,terminal]
113+
----
114+
sh-4.4# chroot /host
115+
----
116+
117+
. Check systemd mount namespace:
118+
+
119+
[source,terminal]
120+
----
121+
sh-4.4# readlink /proc/1/ns/mnt
122+
----
123+
+
124+
.Example output
125+
[source,terminal]
126+
----
127+
mnt:[4026531953]
128+
----
129+
130+
. Check kubelet mount namespace:
131+
+
132+
[source,terminal]
133+
----
134+
sh-4.4# readlink /proc/$(pgrep kubelet)/ns/mnt
135+
----
136+
+
137+
.Example output
138+
[source,terminal]
139+
----
140+
mnt:[4026531840]
141+
----
142+
143+
. Check CRI-O mount namespace:
144+
+
145+
[source,terminal]
146+
----
147+
sh-4.4# readlink /proc/$(pgrep crio)/ns/mnt
148+
----
149+
+
150+
.Example output
151+
[source,terminal]
152+
----
153+
mnt:[4026531840]
154+
----
155+
+
156+
These commands return the mount namespaces associated with systemd, kubelet and the container runtime. In {product-title}, the container runtime is CRI-O.
157+
+
158+
Encapsulation is in effect if kubelet and CRI-O have a different mount namespace location to systemd.
159+
This is the case in the above example.
160+
Encapsulation is not in effect if all three namespaces are in the same mount namespace location.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/optimizing-cpu-usage
4+
:_content-type: CONCEPT
5+
[id="optimizing-cpu-usage_{context}"]
6+
= Encapsulating mount namespaces
7+
8+
Mount namespaces are used to isolate mount points so that processes in different namespaces cannot view each others' files. Encapsulation is the process of moving Kubernetes mount namespaces to an alternative location where they will not be constantly scanned by the host operating system.
9+
10+
The host operating system uses systemd to constantly scan all mount namespaces: both the standard Linux mounts and the numerous mounts that Kubernetes uses to operate. The current implementation of kubelet and CRI-O both use the top-level namespace for all container runtime and kubelet mountpoints. However, encapsulating these container-specific mountpoints in a private namespace reduces systemd overhead with no difference in functionality. Using a separate mount namespace for both CRI-O and kubelet can encapsulate container-specific mounts from any systemd or other host OS interaction.
11+
12+
This ability to potentially achieve major CPU optimization is now available to all {product-title} administrators. Encapsulation can also improve security by storing Kubernetes-specific mount points in a location safe from inspection by unprivileged users.
13+
14+
The following diagrams illustrate a Kubernetes installation before and after encapsulation. Both scenarios show example containers which have mount propagation settings of bidirectional, host-to-container, and none.
15+
16+
image::before-k8s-mount-propagation.png[Before encapsulation]
17+
18+
Here we see systemd, host OS processes, kubelet, and the container runtime sharing a single mount namespace.
19+
20+
* systemd, host OS processes, kubelet, and the container runtime each have access to and visibility of all mount points.
21+
22+
* Container 1, configured with bidirectional mount propagation, can access systemd and host mounts, kubelet and CRI-O mounts. A mount originating in Container 1, such as `/run/a` is visible to systemd, host OS processes, kubelet, container runtime, and other containers with host-to-container or bidirectional mount propagation configured (as in Container 2).
23+
24+
* Container 2, configured with host-to-container mount propagation, can access systemd and host mounts, kubelet and CRI-O mounts. A mount originating in Container 2, such as `/run/b`, is not visible to any other context.
25+
26+
* Container 3, configured with no mount propagation, has no visibility of external mount points. A mount originating in Container 3, such as `/run/c`, is not visible to any other context.
27+
28+
29+
The following diagram illustrates the system state after encapsulation.
30+
31+
image::after-k8s-mount-propagation.png[After encapsulation]
32+
33+
* The main systemd process is no longer devoted to unnecessary scanning of Kubernetes-specific mount points. It only monitors systemd-specific and host mountpoints.
34+
35+
* The host OS processes can access only the systemd and host mountpoints.
36+
37+
* Using a separate mount namespace for both CRI-O and kubelet completely separates all container-specific mounts away from any systemd or other host OS interaction whatsoever.
38+
39+
* The behavior of Container 1 is unchanged, except a mount it creates such as `/run/a` is no longer visible to systemd or host OS processes. It is still visible to kubelet, CRI-O, and other containers with host-to-container or bidirectional mount propagation configured (like Container 2).
40+
41+
* The behavior of Container 2 and Container 3 is unchanged.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/optimizing-cpu-usage.adoc
4+
:_content-type: PROCEDURE
5+
[id="running_services_with_encapsulation_{context}"]
6+
= Running additional services in the encapsulated namespace
7+
8+
Any monitoring tool that relies on the ability to run in the host OS and have visibility of mountpoints created by kubelet, CRI-O, or containers themselves, must enter the container mount namespace to see these mountpoints. The `kubensenter` script that comes with {product-title} executes another command inside the Kubernetes mountpoint and can be used to adapt any existing tools.
9+
10+
The `kubensenter` script is aware of the state of the mount encapsulation feature status, and is safe to run even if encapsulation is not enabled. In that case the script executes the provided command in the default mount namespace.
11+
12+
For example, if a systemd service needs to run inside the new Kubernetes mount namespace, edit the service file and use the `ExecStart=` command line with `kubensenter`.
13+
14+
[source,terminal]
15+
----
16+
[Unit]
17+
Description=Example service
18+
[Service]
19+
ExecStart=/usr/bin/kubensenter /path/to/original/command arg1 arg2
20+
----

modules/supporting_encapsulation.adoc

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/optimizing-cpu-usage.adoc
4+
:_content-type: PROCEDURE
5+
[id="supporting-encapsulation_{context}"]
6+
= Inspecting encapsulated namespaces
7+
8+
As an OpenShift support engineer or developer, you might want to inspect Kubernetes-specific mountpoints.
9+
10+
.Procedure
11+
An administrator logged in to the host OS wishing to obtain information about the Kubernetes system for debugging or auditing purposes must be aware of the new mount namespace. A shell originating in a container, such as `oc debug` will already be inside the Kubernetes namespace. A shell originating in an SSH session will be in the default namespace, and an administrator must run the `kubensenter` script as root to view or interact with these mountpoints.
12+
13+
The `kubensenter` script is aware of the state of the mount encapsulation feature status, and is safe to run even if encapsulation is not enabled. In that case it executes the shell or requested command in the default mount namespace.
14+
15+
. Open a remote shell to the cluster host or `oc debug node/<node-name>`.
16+
17+
. To run a single command inside the Kubernetes namespace, provide the command and any arguments to the `kubensenter` script. For example, to run the `findmnt` command inside the Kubernetes namespace, issue the following command:
18+
+
19+
[source,terminal]
20+
----
21+
$ sudo kubensenter findmnt
22+
----
23+
+
24+
.Example output
25+
[source,terminal]
26+
----
27+
kubensenter: Autodetect: kubens.service namespace found at /run/kubens/mnt
28+
TARGET SOURCE FSTYPE OPTIONS
29+
/ /dev/sda4[/ostree/deploy/rhcos/deploy/32074f0e8e5ec453e56f5a8a7bc9347eaa4172349ceab9c22b709d9d71a3f4b0.0]
30+
| xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
31+
shm tmpfs
32+
...
33+
----
34+
35+
. To start a new interactive shell inside the Kubernetes namespace, run the `kubensenter` script without any arguments:
36+
+
37+
[source,terminal]
38+
----
39+
$ sudo kubensenter
40+
----
41+
+
42+
.Example output
43+
[source,terminal]
44+
----
45+
kubensenter: Autodetect: kubens.service namespace found at /run/kubens/mnt
46+
----
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
:_content-type: ASSEMBLY
2+
[id="optimizing-cpu-usage"]
3+
= Optimizing CPU usage with mount namespace encapsulation
4+
include::_attributes/common-attributes.adoc[]
5+
:context: optimizing-cpu-usage
6+
7+
toc::[]
8+
9+
You can optimize CPU usage in {product-title} clusters by using mount namespace encapsulation to provide a private namespace for kubelet and CRI-O processes. This reduces the cluster CPU resources used by systemd with no difference in functionality.
10+
11+
:FeatureName: Mount namespace encapsulation
12+
include::snippets/technology-preview.adoc[]
13+
14+
include::modules/optimizing-by-encapsulation.adoc[leveloffset=+1]
15+
16+
include::modules/enabling_encapsulation.adoc[leveloffset=+1]
17+
18+
include::modules/supporting_encapsulation.adoc[leveloffset=+1]
19+
20+
include::modules/running_services_with_encapsulation.adoc[leveloffset=+1]
21+
22+
[role="_additional-resources"]
23+
[id="optimizing-cpu-usage-additional-resources"]
24+
== Additional resources
25+
26+
* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/monitoring_and_managing_system_status_and_performance/setting-limits-for-applications_monitoring-and-managing-system-status-and-performance#what-namespaces-are_setting-limits-for-applications[What are namespaces]
27+
28+
* link:https://www.redhat.com/sysadmin/container-namespaces-nsenter[Manage containers in namespaces by using nsenter]
29+
30+
* xref:../rest_api/machine_apis/machineconfig-machineconfiguration-openshift-io-v1.html[MachineConfig]

0 commit comments

Comments
 (0)