Skip to content

Commit 0839347

Browse files
authored
Merge pull request #60840 from kquinn1204/TELCODOCS-1310
TELCODOCS-1310 Rootless DPDK
2 parents 8100753 + 8bad4c9 commit 0839347

File tree

4 files changed

+322
-3
lines changed

4 files changed

+322
-3
lines changed
55.8 KB
Loading

modules/nw-multus-tap-object.adoc

Lines changed: 75 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,80 @@ The following example configures an additional network named `mynet`:
8181
}
8282
----
8383

84+
[id="nw-multus-enable-container_use_devices_{context}"]
85+
86+
== Setting SELinux boolean for the TAP CNI plugin
87+
88+
To create the tap device with the `container_t` SELinux context, enable the `container_use_devices` boolean on the host by using the Machine Config Operator (MCO).
89+
90+
.Prerequisites
91+
92+
* You have installed the OpenShift CLI (`oc`).
93+
94+
.Procedure
95+
96+
. Create a new YAML file named, such as `setsebool-container-use-devices.yaml`, with the following details:
97+
+
98+
[source, yaml]
99+
----
100+
apiVersion: machineconfiguration.openshift.io/v1
101+
kind: MachineConfig
102+
metadata:
103+
labels:
104+
machineconfiguration.openshift.io/role: worker
105+
name: 99-worker-setsebool
106+
spec:
107+
config:
108+
ignition:
109+
version: 3.2.0
110+
systemd:
111+
units:
112+
- enabled: true
113+
name: setsebool.service
114+
contents: |
115+
[Unit]
116+
Description=Set SELinux boolean for the TAP CNI plugin
117+
Before=kubelet.service
118+
119+
[Service]
120+
Type=oneshot
121+
ExecStart=/usr/sbin/setsebool container_use_devices=on
122+
RemainAfterExit=true
123+
124+
[Install]
125+
WantedBy=multi-user.target graphical.target
126+
----
127+
+
128+
129+
. Create the new `MachineConfig` object by running the following command:
130+
+
131+
[source,terminal]
132+
----
133+
$ oc apply -f setsebool-container-use-devices.yaml
134+
----
135+
+
84136
[NOTE]
85137
====
86-
To create the tap device with the `container_t` SELinux context, enable the `container_use_devices` boolean on the host by using the Machine Config Operator (MCO).
87-
====
138+
Applying any changes to the `MachineConfig` object causes all affected nodes to gracefully reboot after the change is applied. This update can take some time to be applied.
139+
====
140+
+
141+
. Verify the change is applied by running the following command:
142+
+
143+
[source,terminal]
144+
----
145+
$ oc get machineconfigpools
146+
----
147+
+
148+
.Expected output
149+
+
150+
[source,terminal,options="nowrap",role="white-space-pre"]
151+
----
152+
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
153+
master rendered-master-e5e0c8e8be9194e7c5a882e047379cfa True False False 3 3 3 0 7d2h
154+
worker rendered-worker-d6c9ca107fba6cd76cdcbfcedcafa0f2 True False False 3 3 3 0 7d
155+
----
156+
+
157+
[NOTE]
158+
====
159+
All nodes should be in the updated and ready state.
160+
====
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="nw-running-dpdk-rootless-tap_{context}"]
7+
= Using the TAP CNI to run a rootless DPDK workload with kernel access
8+
9+
DPDK applications can use `virtio-user` as an exception path to inject certain types of packets, such as log messages, into the kernel for processing. For more information about this feature, see link:https://doc.dpdk.org/guides/howto/virtio_user_as_exception_path.html[Virtio_user as Exception Path].
10+
11+
In OpenShift Container Platform version 4.14 and later, you can use non-privileged pods to run DPDK applications alongside the tap CNI plugin. To enable this functionality, you need to mount the `vhost-net` device by setting the `needVhostNet` parameter to `true` within the `SriovNetworkNodePolicy` object.
12+
13+
.DPDK and TAP example configuration
14+
image::348_OpenShift_rootless_DPDK_0923.png[DPDK and TAP plugin]
15+
16+
.Prerequisites
17+
18+
* You have installed the OpenShift CLI (`oc`).
19+
* You have installed the SR-IOV Network Operator.
20+
* You are logged in as a user with `cluster-admin` privileges.
21+
* Ensure that `setsebools container_use_devices=on` is set as root on all nodes.
22+
+
23+
[NOTE]
24+
====
25+
Use the Machine Config Operator to set this SELinux boolean.
26+
====
27+
28+
.Procedure
29+
30+
. Create a file, such as `test-namespace.yaml`, with content like the following example:
31+
+
32+
[source,yaml]
33+
----
34+
apiVersion: v1
35+
kind: Namespace
36+
metadata:
37+
name: test-namespace
38+
labels:
39+
pod-security.kubernetes.io/enforce: privileged
40+
pod-security.kubernetes.io/audit: privileged
41+
pod-security.kubernetes.io/warn: privileged
42+
security.openshift.io/scc.podSecurityLabelSync: "false"
43+
----
44+
45+
. Create the new `Namespace` object by running the following command:
46+
+
47+
[source,terminal]
48+
----
49+
$ oc apply -f test-namespace.yaml
50+
----
51+
52+
. Create a file, such as `sriov-node-network-policy.yaml`, with content like the following example::
53+
+
54+
[source,yaml]
55+
----
56+
apiVersion: sriovnetwork.openshift.io/v1
57+
kind: SriovNetworkNodePolicy
58+
metadata:
59+
name: sriovnic
60+
namespace: openshift-sriov-network-operator
61+
spec:
62+
deviceType: netdevice <1>
63+
isRdma: true <2>
64+
needVhostNet: true <3>
65+
nicSelector:
66+
vendor: "15b3" <4>
67+
deviceID: "101b" <5>
68+
rootDevices: ["00:05.0"]
69+
numVfs: 10
70+
priority: 99
71+
resourceName: sriovnic
72+
nodeSelector:
73+
feature.node.kubernetes.io/network-sriov.capable: "true"
74+
----
75+
<1> This indicates that the profile is tailored specifically for Mellanox Network Interface Controllers (NICs).
76+
<2> Setting `isRdma` to `true` is only required for a Mellanox NIC.
77+
<3> This mounts the `/dev/net/tun` and `/dev/vhost-net` devices into the container so the application can create a tap device and connect the tap device to the DPDK workload.
78+
<4> The vendor hexadecimal code of the SR-IOV network device. The value 15b3 is associated with a Mellanox NIC.
79+
<5> The device hexadecimal code of the SR-IOV network device.
80+
81+
. Create the `SriovNetworkNodePolicy` object by running the following command:
82+
+
83+
[source,terminal]
84+
----
85+
$ oc create -f sriov-node-network-policy.yaml
86+
----
87+
88+
. Create the following `SriovNetwork` object, and then save the YAML in the `sriov-network-attachment.yaml` file:
89+
+
90+
[source,yaml]
91+
----
92+
apiVersion: sriovnetwork.openshift.io/v1
93+
kind: SriovNetwork
94+
metadata:
95+
name: sriov-network
96+
namespace: openshift-sriov-network-operator
97+
spec:
98+
networkNamespace: test-namespace
99+
resourceName: sriovnic
100+
spoofChk: "off"
101+
trust: "on"
102+
----
103+
+
104+
[NOTE]
105+
=====
106+
See the "Configuring SR-IOV additional network" section for a detailed explanation on each option in `SriovNetwork`.
107+
=====
108+
+
109+
An optional library, `app-netutil`, provides several API methods for gathering network information about a container's parent pod.
110+
111+
. Create the `SriovNetwork` object by running the following command:
112+
+
113+
[source,terminal]
114+
----
115+
$ oc create -f sriov-network-attachment.yaml
116+
----
117+
118+
. Create a file, such as `tap-example.yaml`, that defines a network attachment definition, with content like the following example:
119+
+
120+
[source,yaml]
121+
----
122+
apiVersion: "k8s.cni.cncf.io/v1"
123+
kind: NetworkAttachmentDefinition
124+
metadata:
125+
name: tap-one
126+
namespace: test-namespace <1>
127+
spec:
128+
config: '{
129+
"cniVersion": "0.4.0",
130+
"name": "tap",
131+
"plugins": [
132+
{
133+
"type": "tap",
134+
"multiQueue": true,
135+
"selinuxcontext": "system_u:system_r:container_t:s0"
136+
},
137+
{
138+
"type":"tuning",
139+
"capabilities":{
140+
"mac":true
141+
}
142+
}
143+
]
144+
}'
145+
----
146+
<1> Specify the same `target_namespace` where the `SriovNetwork` object is created.
147+
148+
. Create the `NetworkAttachmentDefinition` object by running the following command:
149+
+
150+
[source,terminal]
151+
----
152+
$ oc apply -f tap-example.yaml
153+
----
154+
155+
. Create a file, such as `dpdk-pod-rootless.yaml`, with content like the following example:
156+
+
157+
[source,yaml]
158+
----
159+
apiVersion: v1
160+
kind: Pod
161+
metadata:
162+
name: dpdk-app
163+
namespace: test-namespace <1>
164+
annotations:
165+
k8s.v1.cni.cncf.io/networks: '[
166+
{"name": "sriov-network", "namespace": "test-namespace"},
167+
{"name": "tap-one", "interface": "ext0", "namespace": "test-namespace"}]'
168+
spec:
169+
nodeSelector:
170+
kubernetes.io/hostname: "worker-0"
171+
securityContext:
172+
fsGroup: 1001 <2>
173+
runAsGroup: 1001 <3>
174+
seccompProfile:
175+
type: RuntimeDefault
176+
containers:
177+
- name: testpmd
178+
image: <DPDK_image> <4>
179+
securityContext:
180+
capabilities:
181+
drop: ["ALL"] <5>
182+
add: <6>
183+
- IPC_LOCK
184+
- NET_RAW #for mlx only <7>
185+
runAsUser: 1001 <8>
186+
privileged: false <9>
187+
allowPrivilegeEscalation: true <10>
188+
runAsNonRoot: true <11>
189+
volumeMounts:
190+
- mountPath: /mnt/huge <12>
191+
name: hugepages
192+
resources:
193+
limits:
194+
openshift.io/sriovnic: "1" <13>
195+
memory: "1Gi"
196+
cpu: "4" <14>
197+
hugepages-1Gi: "4Gi" <15>
198+
requests:
199+
openshift.io/sriovnic: "1"
200+
memory: "1Gi"
201+
cpu: "4"
202+
hugepages-1Gi: "4Gi"
203+
command: ["sleep", "infinity"]
204+
runtimeClassName: performance-cnf-performanceprofile <16>
205+
volumes:
206+
- name: hugepages
207+
emptyDir:
208+
medium: HugePages
209+
----
210+
+
211+
--
212+
<1> Specify the same `target_namespace` in which the `SriovNetwork` object is created. If you want to create the pod in a different namespace, change `target_namespace` in both the `Pod` spec and the `SriovNetwork` object.
213+
<2> Sets the group ownership of volume-mounted directories and files created in those volumes.
214+
<3> Specify the primary group ID used for running the container.
215+
<4> Specify the DPDK image that contains your application and the DPDK library used by application.
216+
<5> Removing all capabilities (`ALL`) from the container's securityContext means that the container has no special privileges beyond what is necessary for normal operation.
217+
<6> Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access. These capabilities must also be set in the binary file by using the `setcap` command.
218+
<7> Mellanox network interface controller (NIC) requires the `NET_RAW` capability.
219+
<8> Specify the user ID used for running the container.
220+
<9> This setting indicates that the container or containers within the pod should not be granted privileged access to the host system.
221+
<10> This setting allows a container to escalate its privileges beyond the initial non-root privileges it might have been assigned.
222+
<11> This setting ensures that the container runs with a non-root user. This helps enforce the principle of least privilege, limiting the potential impact of compromising the container and reducing the attack surface.
223+
<12> Mount a hugepage volume to the DPDK pod under `/mnt/huge`. The hugepage volume is backed by the emptyDir volume type with the medium being `Hugepages`.
224+
<13> Optional: Specify the number of DPDK devices allocated for the DPDK pod. If not explicitly specified, this resource request and limit is automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the `enableInjector` option to `false` in the default `SriovOperatorConfig` CR.
225+
<14> Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to `static` and creating a pod with `Guaranteed` QoS.
226+
<15> Specify hugepage size `hugepages-1Gi` or `hugepages-2Mi` and the quantity of hugepages that will be allocated to the DPDK pod. Configure `2Mi` and `1Gi` hugepages separately. Configuring `1Gi` hugepage requires adding kernel arguments to Nodes. For example, adding kernel arguments `default_hugepagesz=1GB`, `hugepagesz=1G` and `hugepages=16` will result in `16*1Gi` hugepages be allocated during system boot.
227+
<16> If your performance profile is not named `cnf-performance profile`, replace that string with the correct performance profile name.
228+
--
229+
+
230+
. Create the DPDK pod by running the following command:
231+
+
232+
[source,terminal]
233+
----
234+
$ oc create -f dpdk-pod-rootless.yaml
235+
----

networking/hardware_networks/using-dpdk-and-rdma.adoc

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,23 @@ toc::[]
88

99
The containerized Data Plane Development Kit (DPDK) application is supported on {product-title}. You can use Single Root I/O Virtualization (SR-IOV) network hardware with the Data Plane Development Kit (DPDK) and with remote direct memory access (RDMA).
1010

11-
For information on supported devices, refer to xref:../../networking/hardware_networks/about-sriov.adoc#supported-devices_about-sriov[Supported devices].
11+
For information about supported devices, see xref:../../networking/hardware_networks/about-sriov.adoc#supported-devices_about-sriov[Supported devices].
1212

1313
include::modules/nw-sriov-dpdk-example-intel.adoc[leveloffset=+1]
1414

1515
include::modules/nw-sriov-dpdk-example-mellanox.adoc[leveloffset=+1]
1616

17+
include::modules/nw-running-dpdk-rootless-tap.adoc[leveloffset=+1]
18+
19+
[role="_additional-resources"]
20+
.Additional resources
21+
22+
* xref:../../networking/multiple_networks/configuring-additional-network.adoc#nw-multus-enable-container_use_devices_configuring-additional-network[Enabling the container_use_devices boolean]
23+
24+
* xref:../../scalability_and_performance/cnf-create-performance-profiles.adoc#cnf-create-performance-profiles[Creating a performance profile]
25+
26+
* xref:../../networking/hardware_networks/configuring-sriov-device.adoc#configuring-sriov-device[Configuring an SR-IOV network device]
27+
1728
include::modules/nw-sriov-concept-dpdk-line-rate.adoc[leveloffset=+1]
1829

1930
include::modules/nw-sriov-example-dpdk-line-rate.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)