|
| 1 | +# DPDK (input plugin) |
| 2 | + |
| 3 | +DPDK (Data Plane Development Kit) is used for high-performance packet processing. It enables |
| 4 | +direct access to network interfaces, bypassing the kernel, and is suitable for use in environments |
| 5 | +requiring high throughput, low latency, and high packet processing rates. |
| 6 | + |
| 7 | +## Example configuration |
| 8 | + |
| 9 | +```yaml |
| 10 | +input_plugin: |
| 11 | + dpdk: |
| 12 | + allowed_nics: "0000:ca:00.0" |
| 13 | + ### Optional parameters |
| 14 | + burst_size: 64 |
| 15 | + mempool_size: 8192 |
| 16 | + rx_queues: 8 |
| 17 | + workers_cpu_list: [] |
| 18 | + eal_opts: null |
| 19 | + mtu: 1518 |
| 20 | +``` |
| 21 | +
|
| 22 | +## Parameters |
| 23 | +
|
| 24 | +**Mandatory parameters:** |
| 25 | +
|
| 26 | +|Parameter | Description | |
| 27 | +|---|---| |
| 28 | +|__allowed_nics__|List of allowed NICs in PCI address format `0000:XX:YY.Z` separated with `,` | |
| 29 | + |
| 30 | +**Optional parameters:** |
| 31 | +|Parameter | Default | Description | |
| 32 | +|---|---|---| |
| 33 | +|__burst_size__ | 64 | Number of packets processed in each burst cycle. Affects batch processing efficiency. | |
| 34 | +|__mempool_size__ | 8192 | Size of the memory pool used for buffering incoming packets. Must be a power of 2.| |
| 35 | +|__rx_queues__ | 1| Number of RX queues workers. Increasing this can help distribute load across multiple CPU cores. | |
| 36 | +|__workers_cpu_list__| [] (autofill) | List of CPU cores assigned to RX queues (must match number of rx_queues) | |
| 37 | +|__eal_opts__ | null | Extra options to be passed to the DPDK EAL (Environment Abstraction Layer). Can be used for fine-tuning DPDK behavior.| |
| 38 | +|__mtu__ | 1518 | Maximum Transmission Unit size for the interface. Defines the maximum packet size that can be received.| |
| 39 | + |
| 40 | +## How to use |
| 41 | + |
| 42 | +To use the DPDK input plugin, you must ensure that your system is properly configured for DPDK operation. This includes the following steps: |
| 43 | + |
| 44 | +### 1. Install DPDK Tools |
| 45 | + |
| 46 | +To begin with, you will need to install DPDK and its associated tools. Follow the installation instructions for your operating system: |
| 47 | + |
| 48 | +- **On RHEL/CentOS**: |
| 49 | +```sh |
| 50 | +dnf install dpdk-tools |
| 51 | +``` |
| 52 | + |
| 53 | +- **On Debian/Ubuntu**: |
| 54 | +```sh |
| 55 | +apt-get install dpdk |
| 56 | +``` |
| 57 | + |
| 58 | +### 2. Identify the PCI Address of the Network Interface |
| 59 | +DPDK operates directly with network interfaces identified by their PCI addresses, not by traditional interface names like `eth0` or `ens3`. |
| 60 | + |
| 61 | +A PCI address looks like this: |
| 62 | + |
| 63 | +``` |
| 64 | +0000:ca:00.0 |
| 65 | +``` |
| 66 | + |
| 67 | +This format includes: |
| 68 | +- Domain: `0000` – typically 0000 on most systems |
| 69 | +- Bus: `ca` |
| 70 | +- Device: `00` |
| 71 | +- Function: `0` |
| 72 | + |
| 73 | +Each network interface has a unique PCI address, and this is how DPDK identifies which interface to bind and use. |
| 74 | + |
| 75 | +🔍 How to find the PCI address of a network interface |
| 76 | +The recommended way to identify the PCI address is using the DPDK helper tool, dpdk-devbind.py. This tool lists all the NICs in the system with their PCI addresses and shows which drivers are currently bound to them. |
| 77 | + |
| 78 | +``` |
| 79 | +dpdk-devbind.py --status |
| 80 | +``` |
| 81 | +
|
| 82 | +This shows the PCI addresses of all detected NICs along with the drivers they're bound to. If the NIC is not already bound to a DPDK-compatible driver (e.g., `vfio-pci`), you can bind it using this tool. |
| 83 | +
|
| 84 | +### 3. Identify the Numa node of the Network interface |
| 85 | +
|
| 86 | +DPDK operates efficiently on systems with multiple **NUMA (Non-Uniform Memory Access)** nodes, and it is essential to know which NUMA node a network interface belongs to, as this affects memory locality and performance. |
| 87 | +
|
| 88 | +Each physical device (like a network interface card) is associated with a specific NUMA node, which influences the memory accesses and CPU affinity during packet processing. You can use the NUMA node information to optimize the performance of your application by binding the NIC to a NUMA node that is also closest to the processing CPU cores. |
| 89 | +
|
| 90 | +**🔍 How to find the NUMA node of a network interface?** |
| 91 | +
|
| 92 | +You can identify the NUMA node to which a specific network interface is attached by directly checking the /sys filesystem. |
| 93 | +
|
| 94 | +Using the `/sys/bus/pci/devices/{pci_address}/numa_node` path: |
| 95 | +
|
| 96 | +The most direct method to get the NUMA node for a network interface is by querying the numa_node file in the /sys directory. |
| 97 | +
|
| 98 | +First, you need to know the PCI address of your network interface. Once you have the PCI address (e.g., 0000:ca:00.0), you can check the NUMA node for that interface by reading the numa_node file: |
| 99 | +
|
| 100 | +``` |
| 101 | +cat /sys/bus/pci/devices/0000:ca:00.0/numa_node |
| 102 | +``` |
| 103 | +
|
| 104 | +This will output the NUMA node number where the NIC is located. For example: |
| 105 | +``` |
| 106 | +0 |
| 107 | +``` |
| 108 | +
|
| 109 | +This indicates that the NIC is attached to NUMA node 0. |
| 110 | +
|
| 111 | +
|
| 112 | +If the output is -1, it means that the device does not have an associated NUMA node or the system does not have NUMA support. |
| 113 | +
|
| 114 | +### 4. Allocate Hugepages |
| 115 | +
|
| 116 | +DPDK requires hugepages for optimal performance, as they provide large, contiguous memory blocks that reduce overhead and improve data throughput. Hugepages are critical for performance in high-speed networking environments, such as those used by DPDK, where low latency and high throughput are required. |
| 117 | +
|
| 118 | +**🛠️ Configuring Hugepages via Kernel Parameters [Recommended]** |
| 119 | +
|
| 120 | +You can configure hugepages directly at the kernel level using the grubby tool. This approach is recommended if you want to make the hugepages configuration persistent across system reboots. |
| 121 | +
|
| 122 | +To configure hugepages via grubby, use the following command: |
| 123 | +
|
| 124 | +``` |
| 125 | +grubby --update-kernel ALL --args "default_hugepagesz=1GB hugepagesz=1G hugepages=4" |
| 126 | +``` |
| 127 | +This command will: |
| 128 | +
|
| 129 | +- Set the default hugepage size to 1GB (default_hugepagesz=1GB). |
| 130 | +- Set the hugepage size to 1GB (hugepagesz=1G). |
| 131 | +- Allocate 4 hugepages (hugepages=4). |
| 132 | +
|
| 133 | +**⚠️ Important Note:** When using this method, it is not possible to specify a particular NUMA node. The hugepages will be distributed evenly across all available NUMA nodes on the system. This means the memory for hugepages will be shared equally among the NUMA nodes without considering any specific NUMA affinity for your application. |
| 134 | +
|
| 135 | +After running the command, you need to reboot the system for the changes to take effect. |
| 136 | +
|
| 137 | +--- |
| 138 | +**📌 Allocating Hugepages Using dpdk-hugepages.py** |
| 139 | +
|
| 140 | +Alternatively, you can use the dpdk-hugepages.py script to allocate hugepages at runtime. This method allows you to allocate hugepages dynamically and specify NUMA nodes. |
| 141 | +
|
| 142 | +To allocate hugepages using dpdk-hugepages.py, you can run the following command: |
| 143 | +
|
| 144 | +``` |
| 145 | +dpdk-hugepages.py -p 1G --setup 2G --node 0 |
| 146 | +``` |
| 147 | +
|
| 148 | +This command allocates 2GB of hugepages with a 1GB page size on NUMA node 0. You can adjust these values based on your system's memory requirements and the NUMA node of your NIC. |
| 149 | +
|
| 150 | +If you require more hugepages, you can increase the amount by modifying the --setup parameter, as follows: |
| 151 | +
|
| 152 | +``` |
| 153 | +dpdk-hugepages.py -p 1G --setup 4G --node 0 |
| 154 | +``` |
| 155 | +
|
| 156 | +This will allocate 4GB of hugepages with a 1GB page size on NUMA node 0. |
| 157 | +
|
| 158 | +**📌 Recommended Hugepages Configuration for High-Speed Links (100G, 200G, 400G)** |
| 159 | +
|
| 160 | +For high-speed links such as 100G, 200G, or 400G, it is crucial to allocate enough hugepages to handle the massive packet processing and memory requirements. The following values are recommended based on typical usage for each type of link: |
| 161 | +
|
| 162 | +**100G link:** |
| 163 | +For 100G links, you should allocate at least 4GB of hugepages with a 1GB page size. This is generally sufficient for moderate traffic and packet processing needs. |
| 164 | +
|
| 165 | +Recommended command: |
| 166 | +``` |
| 167 | +dpdk-hugepages.py -p 1G --setup 4G --node 0 |
| 168 | +``` |
| 169 | +--- |
| 170 | +***200G link:*** |
| 171 | +For 200G links, you will likely need 8GB of hugepages with a 1GB page size, as the traffic and memory bandwidth requirements will be higher. |
| 172 | +
|
| 173 | +Recommended command: |
| 174 | +``` |
| 175 | +dpdk-hugepages.py -p 1G --setup 8G --node 0 |
| 176 | +``` |
| 177 | +
|
| 178 | +--- |
| 179 | +
|
| 180 | +***400G link:*** |
| 181 | +For 400G links, a significant increase in hugepages is required. You should allocate 16GB of hugepages with a 1GB page size to handle the higher throughput and memory demands. |
| 182 | +
|
| 183 | +Recommended command: |
| 184 | +``` |
| 185 | +dpdk-hugepages.py -p 1G --setup 16G --node 0 |
| 186 | +``` |
| 187 | +
|
| 188 | +**🖥️ Verify the allocated Hugepages** |
| 189 | +
|
| 190 | +To verify that the hugepages were allocated successfully, you can use the following command: |
| 191 | +
|
| 192 | +``` dpdk-hugepages -s ``` |
| 193 | +
|
| 194 | +This command shows the status of the hugepages and confirms how much memory has been allocated. |
| 195 | +
|
| 196 | +--- |
| 197 | +
|
| 198 | +### 5. Configure the DPDK Driver |
| 199 | +
|
| 200 | +TODO Mellanox, broadcom, intel |
| 201 | +TODO Lukas |
| 202 | +
|
| 203 | +### 6. Isolate CPUs (optionally) |
| 204 | +
|
| 205 | +Isolating specific CPUs can enhance the performance of DPDK applications by dedicating certain processors to networking tasks, reducing interference from other system processes. This isolation minimizes context switching and ensures that the CPUs are dedicated to packet processing. |
| 206 | +
|
| 207 | +**🛠️ How to Isolate CPUs?** |
| 208 | +
|
| 209 | +To isolate CPUs on a system, you can use the tuned package and adjust kernel parameters. The steps below demonstrate how to set up CPU isolation using tuned profiles and kernel boot parameters. These steps apply to both Intel and AMD systems, with specific configuration examples for both. |
| 210 | +
|
| 211 | +1. **Install the necessary package** |
| 212 | +
|
| 213 | + First, install the tuned-profiles-cpu-partitioning package, which contains CPU isolation profiles: |
| 214 | +
|
| 215 | + ```bash |
| 216 | + dnf install tuned-profiles-cpu-partitioning |
| 217 | + ``` |
| 218 | +
|
| 219 | +2. **Enable IOMMU and Isolate CPUs Using GRUB** |
| 220 | +
|
| 221 | + **For Intel Systems**: Enable Intel IOMMU for direct device access in DPDK. Update your GRUB configuration by adding the following arguments for Intel processors: |
| 222 | + ``` |
| 223 | + grubby --update-kernel ALL --args "iommu=pt intel_iommu=on" |
| 224 | + ``` |
| 225 | +
|
| 226 | + **For AMD Systems**: Enable AMD IOMMU for direct device access in DPDK. Update your GRUB configuration by adding the following arguments for AMD processors: |
| 227 | +
|
| 228 | + ``` |
| 229 | + grubby --update-kernel ALL --args "iommu=pt amd_iommu=on" |
| 230 | + ``` |
| 231 | +
|
| 232 | +3. Isolate CPUs for DPDK |
| 233 | +
|
| 234 | + Once IOMMU is enabled, you can isolate specific CPUs for DPDK using the isolcpus parameter. This ensures that only the isolated CPUs are used for networking tasks. |
| 235 | +
|
| 236 | + To isolate CPUs 2-19 and 22-39 on an Intel system, use the following command: |
| 237 | +
|
| 238 | +``` |
| 239 | +grubby --update-kernel ALL --args "isolcpus=2-19,22-39" |
| 240 | +``` |
| 241 | +
|
| 242 | +
|
| 243 | +### 4. Validate with dpdk-testpmd |
| 244 | +
|
| 245 | +TODO |
| 246 | +
|
| 247 | +## FAQ |
| 248 | +
|
| 249 | +|Q: | How many `rx_queues` should I configure? | |
| 250 | +|---|---| |
| 251 | +|A: | TODO | |
| 252 | +
|
| 253 | +|Q: | ??? | |
| 254 | +|---|---| |
| 255 | +|A: | TODO | |
| 256 | +
|
| 257 | +
|
| 258 | +
|
0 commit comments