Skip to content

Commit e9ecaf5

Browse files
authored
Merge pull request #46316 from tmulquee/TELCODOCS-395
TELCODOCS-395: D/S Docs & RN: CNF-1153 DPDK deployment guide and best practices WIP
2 parents 95335fd + c7a660e commit e9ecaf5

10 files changed

+371
-26
lines changed

images/261_OpenShift_DPDK_0722.png

58.7 KB
Loading

images/dpdk_line_rate.png

46.6 KB
Loading
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="nw-sriov-example-dpdk-line-rate_{context}"]
7+
= Overview of achieving a specific DPDK line rate
8+
9+
To achieve a specific Data Plane Development Kit (DPDK) line rate, deploy a Node Tuning Operator and configure Single Root I/O Virtualization (SR-IOV). You must also tune the DPDK settings for the following resources:
10+
11+
- Isolated CPUs
12+
- Hugepages
13+
- The topology scheduler
14+
15+
[NOTE]
16+
====
17+
In previous versions of {product-title}, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for {product-title} applications. In {product-title} 4.11 and later, this functionality is part of the Node Tuning Operator.
18+
====
19+
20+
.DPDK test environment
21+
The following diagram shows the components of a traffic-testing environment:
22+
23+
image::261_OpenShift_DPDK_0722.png[DPDK test environment]
24+
25+
- **Traffic generator**: An application that can generate high-volume packet traffic.
26+
- **SR-IOV-supporting NIC**: A network interface card compatible with SR-IOV. The card runs a number of virtual functions on a physical interface.
27+
- **Physical Function (PF)**: A PCI Express (PCIe) function of a network adapter that supports the SR-IOV interface.
28+
- **Virtual Function (VF)**: A lightweight PCIe function on a network adapter that supports SR-IOV. The VF is associated with the PCIe PF on the network adapter. The VF represents a virtualized instance of the network adapter.
29+
- **Switch**: A network switch. Nodes can also be connected back-to-back.
30+
- **`testpmd`**: An example application included with DPDK. The `testpmd` application can be used to test the DPDK in a packet-forwarding mode. The `testpmd` application is also an example of how to build a fully-fledged application using the DPDK Software Development Kit (SDK).
31+
- **worker 0** and **worker 1**: {product-title} nodes.

modules/nw-sriov-create-object.adoc

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
4+
5+
:_content-type: REFERENCE
6+
[id="nw-sriov-create-object_{context}"]
7+
= Example SR-IOV network operator
8+
9+
The following is an example definition of an `sriovNetwork` object. In this case, Intel and Mellanox configurations are identical:
10+
[source,yaml]
11+
----
12+
apiVersion: sriovnetwork.openshift.io/v1
13+
kind: SriovNetwork
14+
metadata:
15+
name: dpdk-network-1
16+
namespace: openshift-sriov-network-operator
17+
spec:
18+
ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.1.0/24"}]],"dataDir":
19+
"/run/my-orchestrator/container-ipam-state-1"}' <1>
20+
networkNamespace: dpdk-test <2>
21+
spoofChk: "off"
22+
trust: "on"
23+
resourceName: dpdk_nic_1 <3>
24+
---
25+
apiVersion: sriovnetwork.openshift.io/v1
26+
kind: SriovNetwork
27+
metadata:
28+
name: dpdk-network-2
29+
namespace: openshift-sriov-network-operator
30+
spec:
31+
ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.2.0/24"}]],"dataDir":
32+
"/run/my-orchestrator/container-ipam-state-1"}'
33+
networkNamespace: dpdk-test
34+
spoofChk: "off"
35+
trust: "on"
36+
resourceName: dpdk_nic_2
37+
----
38+
<1> You can use a different IP Address Management (IPAM) implementation, such as Whereabouts. For more information, see _Dynamic IP address assignment configuration with Whereabouts_.
39+
<2> You must request the `networkNamespace` where the network attachment definition will be created. You must create the `sriovNetwork` CR under the `openshift-sriov-network-operator` namespace.
40+
<3> The `resourceName` value must match that of the `resourceName` created under the `sriovNetworkNodePolicy`.
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
4+
5+
:_content-type: REFERENCE
6+
[id="nw-sriov-dpdk-base-workload_{context}"]
7+
= Example DPDK base workload
8+
9+
The following is an example of a Data Plane Development Kit (DPDK) container:
10+
[source,yaml]
11+
----
12+
apiVersion: v1
13+
kind: Namespace
14+
metadata:
15+
name: dpdk-test
16+
---
17+
apiVersion: v1
18+
kind: Pod
19+
metadata:
20+
annotations:
21+
k8s.v1.cni.cncf.io/networks: '[ <1>
22+
{
23+
"name": "dpdk-network-1",
24+
"namespace": "dpdk-test"
25+
},
26+
{
27+
"name": "dpdk-network-2",
28+
"namespace": "dpdk-test"
29+
}
30+
]'
31+
irq-load-balancing.crio.io: "disable" <2>
32+
cpu-load-balancing.crio.io: "disable"
33+
cpu-quota.crio.io: "disable"
34+
labels:
35+
app: dpdk
36+
name: testpmd
37+
namespace: dpdk-test
38+
spec:
39+
runtimeClassName: performance-performance <3>
40+
containers:
41+
- command:
42+
- /bin/bash
43+
- -c
44+
- sleep INF
45+
image: registry.redhat.io/openshift4/dpdk-base-rhel8
46+
imagePullPolicy: Always
47+
name: dpdk
48+
resources: <4>
49+
limits:
50+
cpu: "16"
51+
hugepages-1Gi: 8Gi
52+
memory: 2Gi
53+
requests:
54+
cpu: "16"
55+
hugepages-1Gi: 8Gi
56+
memory: 2Gi
57+
securityContext:
58+
capabilities:
59+
add:
60+
- IPC_LOCK
61+
- SYS_RESOURCE
62+
- NET_RAW
63+
- NET_ADMIN
64+
runAsUser: 0
65+
volumeMounts:
66+
- mountPath: /mnt/huge
67+
name: hugepages
68+
terminationGracePeriodSeconds: 5
69+
volumes:
70+
- emptyDir:
71+
medium: HugePages
72+
name: hugepages
73+
----
74+
<1> Request the SR-IOV networks you need. Resources for the devices will be injected automatically.
75+
<2> Disable the CPU and IRQ load balancing base. See _Disabling interrupt processing for individual pods_ for more information.
76+
<3> Set the `runtimeClass` to `performance-performance`. Do not set the `runtimeClass` to `HostNetwork` or `privileged`.
77+
<4> Request an equal number of resources for requests and limits to start the pod with `Guaranteed` Quality of Service (QoS).
78+
79+
[NOTE]
80+
====
81+
Do not start the pod with `SLEEP` and then exec into the pod to start the testpmd or the DPDK workload. This can add additional interrupts as the `exec` process is not pinned to any CPU.
82+
====

modules/nw-sriov-dpdk-example-mellanox.adoc

Lines changed: 25 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,17 @@
66
[id="example-vf-use-in-dpdk-mode-mellanox_{context}"]
77
= Using a virtual function in DPDK mode with a Mellanox NIC
88

9+
You can create a network node policy and create a Data Plane Development Kit (DPDK) pod using a virtual function in DPDK mode with a Mellanox NIC.
10+
911
.Prerequisites
1012

11-
* Install the OpenShift CLI (`oc`).
12-
* Install the SR-IOV Network Operator.
13-
* Log in as a user with `cluster-admin` privileges.
13+
* You have installed the OpenShift CLI (`oc`).
14+
* You have installed the Single Root I/O Virtualization (SR-IOV) Network Operator.
15+
* You have logged in as a user with `cluster-admin` privileges.
1416
1517
.Procedure
1618

17-
. Create the following `SriovNetworkNodePolicy` object, and then save the YAML in the `mlx-dpdk-node-policy.yaml` file.
19+
. Save the following `SriovNetworkNodePolicy` YAML configuration to an `mlx-dpdk-node-policy.yaml` file:
1820
+
1921
[source,yaml]
2022
----
@@ -37,16 +39,16 @@ spec:
3739
deviceType: netdevice <2>
3840
isRdma: true <3>
3941
----
40-
<1> Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are `1015`, `1017`.
41-
<2> Specify the driver type for the virtual functions to `netdevice`. Mellanox SR-IOV VF can work in DPDK mode without using the `vfio-pci` device type. VF device appears as a kernel network interface inside a container.
42-
<3> Enable RDMA mode. This is required by Mellanox cards to work in DPDK mode.
42+
<1> Specify the device hex code of the SR-IOV network device. The only allowed values for Mellanox cards are `1015` and `1017`.
43+
<2> Specify the driver type for the virtual functions to `netdevice`. A Mellanox SR-IOV Virtual Function (VF) can work in DPDK mode without using the `vfio-pci` device type. The VF device appears as a kernel network interface inside a container.
44+
<3> Enable Remote Direct Memory Access (RDMA) mode. This is required for Mellanox cards to work in DPDK mode.
4345
+
4446
[NOTE]
4547
=====
46-
See the `Configuring SR-IOV network devices` section for detailed explanation on each option in `SriovNetworkNodePolicy`.
48+
See _Configuring an SR-IOV network device_ for a detailed explanation of each option in the `SriovNetworkNodePolicy` object.
4749

48-
When applying the configuration specified in a `SriovNetworkNodePolicy` object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes.
49-
It may take several minutes for a configuration change to apply.
50+
When applying the configuration specified in an `SriovNetworkNodePolicy` object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes.
51+
It might take several minutes for a configuration change to apply.
5052
Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.
5153

5254
After the configuration update is applied, all the pods in the `openshift-sriov-network-operator` namespace will change to a `Running` status.
@@ -59,7 +61,7 @@ After the configuration update is applied, all the pods in the `openshift-sriov-
5961
$ oc create -f mlx-dpdk-node-policy.yaml
6062
----
6163

62-
. Create the following `SriovNetwork` object, and then save the YAML in the `mlx-dpdk-network.yaml` file.
64+
. Save the following `SriovNetwork` YAML configuration to an `mlx-dpdk-network.yaml` file:
6365
+
6466
[source,yaml]
6567
----
@@ -71,27 +73,27 @@ metadata:
7173
spec:
7274
networkNamespace: <target_namespace>
7375
ipam: |- <1>
74-
# ...
76+
...
7577
vlan: <vlan>
7678
resourceName: mlxnics
7779
----
78-
<1> Specify a configuration object for the ipam CNI plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
80+
<1> Specify a configuration object for the IP Address Management (IPAM) Container Network Interface (CNI) plug-in as a YAML block scalar. The plug-in manages IP address assignment for the attachment definition.
7981
+
8082
[NOTE]
8183
=====
82-
See the "Configuring SR-IOV additional network" section for a detailed explanation on each option in `SriovNetwork`.
84+
See _Configuring an SR-IOV network device_ for a detailed explanation on each option in the `SriovNetwork` object.
8385
=====
8486
+
85-
An optional library, app-netutil, provides several API methods for gathering network information about a container's parent pod.
87+
The `app-netutil` option library provides several API methods for gathering network information about the parent pod of a container.
8688

87-
. Create the `SriovNetworkNodePolicy` object by running the following command:
89+
. Create the `SriovNetwork` object by running the following command:
8890
+
8991
[source,terminal]
9092
----
9193
$ oc create -f mlx-dpdk-network.yaml
9294
----
95+
. Save the following `Pod` YAML configuration to an `mlx-dpdk-pod.yaml` file:
9396

94-
. Create the following `Pod` spec, and then save the YAML in the `mlx-dpdk-pod.yaml` file.
9597
+
9698
[source,yaml]
9799
----
@@ -130,13 +132,13 @@ spec:
130132
emptyDir:
131133
medium: HugePages
132134
----
133-
<1> Specify the same `target_namespace` where `SriovNetwork` object `mlx-dpdk-network` is created. If you would like to create the pod in a different namespace, change `target_namespace` in both `Pod` spec and `SriovNetowrk` object.
134-
<2> Specify the DPDK image which includes your application and the DPDK library used by application.
135+
<1> Specify the same `target_namespace` where `SriovNetwork` object `mlx-dpdk-network` is created. To create the pod in a different namespace, change `target_namespace` in both the `Pod` spec and `SriovNetwork` object.
136+
<2> Specify the DPDK image which includes your application and the DPDK library used by the application.
135137
<3> Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access.
136-
<4> Mount the hugepage volume to the DPDK pod under `/dev/hugepages`. The hugepage volume is backed by the emptyDir volume type with the medium being `Hugepages`.
137-
<5> Optional: Specify the number of DPDK devices allocated to the DPDK pod. This resource request and limit, if not explicitly specified, will be automatically added by SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the `enableInjector` option to `false` in the default `SriovOperatorConfig` CR.
138-
<6> Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs be allocated from kubelet. This is achieved by setting CPU Manager policy to `static` and creating a pod with `Guaranteed` QoS.
139-
<7> Specify hugepage size `hugepages-1Gi` or `hugepages-2Mi` and the quantity of hugepages that will be allocated to DPDK pod. Configure `2Mi` and `1Gi` hugepages separately. Configuring `1Gi` hugepage requires adding kernel arguments to Nodes.
138+
<4> Mount the hugepage volume to the DPDK pod under `/dev/hugepages`. The hugepage volume is backed by the `emptyDir` volume type with the medium being `Hugepages`.
139+
<5> Optional: Specify the number of DPDK devices allocated for the DPDK pod. If not explicitly specified, this resource request and limit is automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the `enableInjector` option to `false` in the default `SriovOperatorConfig` CR.
140+
<6> Specify the number of CPUs. The DPDK pod usually requires that exclusive CPUs be allocated from the kubelet. To do this, set the CPU Manager policy to `static` and create a pod with `Guaranteed` Quality of Service (QoS).
141+
<7> Specify hugepage size `hugepages-1Gi` or `hugepages-2Mi` and the quantity of hugepages that will be allocated to the DPDK pod. Configure `2Mi` and `1Gi` hugepages separately. Configuring `1Gi` hugepages requires adding kernel arguments to Nodes.
140142

141143
. Create the DPDK pod by running the following command:
142144
+
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
4+
5+
:_content-type: REFERENCE
6+
[id="nw-sriov-dpdk-running-testpmd_{context}"]
7+
= Example testpmd script
8+
9+
The following is an example script for running `testpmd`:
10+
11+
[source,terminal]
12+
----
13+
#!/bin/bash
14+
set -ex
15+
export CPU=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus)
16+
echo ${CPU}
17+
18+
dpdk-testpmd -l ${CPU} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_1} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_2} -n 4 -- -i --nb-cores=15 --rxd=4096 --txd=4096 --rxq=7 --txq=7 --forward-mode=mac --eth-peer=0,50:00:00:00:00:01 --eth-peer=1,50:00:00:00:00:02
19+
----
20+
This example uses two different `sriovNetwork` CRs. The environment variable contains the Virtual Function (VF) PCI address that was allocated for the pod. If you use the same network in the pod definition, you must split the `pciAddress`.
21+
It is important to configure the correct MAC addresses of the traffic generator. This example uses custom MAC addresses.
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/hardware_networks/using-dpdk-and-rdma.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="nw-example-dpdk-line-rate_{context}"]
7+
= Using SR-IOV and the Node Tuning Operator to achieve a DPDK line rate
8+
You can use the Node Tuning Operator to configure isolated CPUs, hugepages, and a topology scheduler.
9+
You can then use the Node Tuning Operator with Single Root I/O Virtualization (SR-IOV) to achieve a specific Data Plane Development Kit (DPDK) line rate.
10+
11+
.Prerequisites
12+
13+
* You have installed the OpenShift CLI (`oc`).
14+
* You have installed the SR-IOV Network Operator.
15+
* You have logged in as a user with `cluster-admin` privileges.
16+
* You have deployed a standalone Node Tuning Operator.
17+
+
18+
[NOTE]
19+
====
20+
In previous versions of {product-title}, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In {product-title} 4.11 and later, this functionality is part of the Node Tuning Operator.
21+
====
22+
23+
.Procedure
24+
. Create a `PerformanceProfile` object based on the following example:
25+
+
26+
[source,yaml]
27+
----
28+
apiVersion: performance.openshift.io/v2
29+
kind: PerformanceProfile
30+
metadata:
31+
name: performance
32+
spec:
33+
globallyDisableIrqLoadBalancing: true
34+
cpu:
35+
isolated: 21-51,73-103 <1>
36+
reserved: 0-20,52-72 <2>
37+
hugepages:
38+
defaultHugepagesSize: 1G <3>
39+
pages:
40+
- count: 32
41+
size: 1G
42+
net:
43+
userLevelNetworking: true
44+
numa:
45+
topologyPolicy: "single-numa-node"
46+
nodeSelector:
47+
node-role.kubernetes.io/worker-cnf: ""
48+
----
49+
<1> If hyperthreading is enabled on the system, allocate the relevant symbolic links to the `isolated` and `reserved` CPU groups. If the system contains multiple non-uniform memory access nodes (NUMAs), allocate CPUs from both NUMAs to both groups. You can also use the Performance Profile Creator for this task. For more information, see _Creating a performance profile_.
50+
<2> You can also specify a list of devices that will have their queues set to the reserved CPU count. For more information, see _Reducing NIC queues using the Node Tuning Operator_.
51+
<3> Allocate the number and size of hugepages needed. You can specify the NUMA configuration for the hugepages. By default, the system allocates an even number to every NUMA node on the system. If needed, you can request the use of a realtime kernel for the nodes. See _Provisioning a worker with real-time capabilities_ for more information.
52+
. Save the `yaml` file as `mlx-dpdk-perfprofile-policy.yaml`.
53+
. Apply the performance profile using the following command:
54+
+
55+
[source,terminal]
56+
----
57+
$ oc create -f mlx-dpdk-perfprofile-policy.yaml
58+
----

0 commit comments

Comments
 (0)