Skip to content

Commit 2f67b8f

Browse files
committed
MANA DPDK: document rdma-core mismatch for backported kernels
1 parent 1afd060 commit 2f67b8f

File tree

1 file changed

+82
-2
lines changed

1 file changed

+82
-2
lines changed

articles/virtual-network/setup-dpdk-mana.md

Lines changed: 82 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,15 +132,15 @@ dpdk-testpmd -l 1-9 --vdev="$BUS_INFO,mac=$MANA_MAC" -- --forward-mode=txonly --
132132
### Fail to set interface down.
133133
Failure to set the MANA bound device to DOWN can result in low or zero packet throughput.
134134
The failure to release the device can result the EAL error message related to transmit queues.
135-
```
135+
```log
136136
mana_start_tx_queues(): Failed to create qp queue index 0
137137
mana_dev_start(): failed to start tx queues -19
138138
```
139139

140140
### Failure to enable huge pages.
141141

142142
Try enabling huge pages and ensuring the information is visible in meminfo.
143-
```
143+
```log
144144
EAL: No free 2048 kB hugepages reported on node 0
145145
EAL: FATAL: Cannot get hugepage information.
146146
EAL: Cannot get hugepage information.
@@ -151,3 +151,83 @@ Cause: Cannot init EAL: Permission denied
151151
### Low throughput with use of --vdev="net_vdev_netvsc0,iface=eth1"
152152

153153
Failover configuration of either the `net_failsafe` or `net_vdev_netvsc` poll-mode-drivers isn't recommended for high performance on Azure. The netvsc configuration with DPDK version 20.11 or higher may give better results. For optimal performance, ensure your Linux kernel, rdma-core, and DPDK packages meet the listed requirements for DPDK and MANA.
154+
155+
### Version mismatch for rdma-core
156+
Mismatches in rdma-core and the linux kernel can occur any time a user is building some combination of rdma-core, DPDK, and the linux kernel from source. This error can cause a number of issues, on MANA it will likely result in a failed probe of the MANA virtual function (VF).
157+
158+
```log
159+
EAL: Probe PCI driver: net_mana (1414:ba) device: 7870:00:00.0 (socket 0)
160+
mana_arg_parse_callback(): key=mac value=00:0d:3a:76:3b:d0 index=0
161+
mana_init_once(): MP INIT PRIMARY
162+
mana_pci_probe_mac(): Probe device name mana_0 dev_name uverbs0 ibdev_path /sys/class/infiniband/mana_0
163+
mana_probe_port(): device located port 2 address 00:0D:3A:76:3B:D0
164+
mana_probe_port(): ibv_alloc_parent_domain failed port 2
165+
mana_pci_probe_mac(): Probe on IB port 2 failed -12
166+
EAL: Requested device 7870:00:00.0 cannot be used
167+
EAL: Bus (pci) probe failed.
168+
hn_vf_attach(): Couldn't find port for VF
169+
hn_vf_add(): RNDIS reports VF but device not found, retrying
170+
171+
```
172+
This likely results from using a kernel with backported patches for mana_ib with a newer version of rdma-core. The root cause is an interaction between the kernel rdma drivers and userspace rdma-core libraries.
173+
174+
The Linux kernel uapi for rdma has a list of rdma provider ids, in backported versions of the kernel this ID value can differ from the version in the rdma-core libraries.
175+
> {!NOTE}
176+
> Example snippets are from [Ubuntu 5.150-1045 linux-azure](https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/focal/tree/include/uapi/rdma/ib_user_ioctl_verbs.h?h=azure-5.15-next) and [rdma-core v46.0](https://github.com/linux-rdma/rdma-core/blob/4cce53f5be035137c9d31d28e204502231a56382/kernel-headers/rdma/ib_user_ioctl_verbs.h#L220)
177+
```c
178+
// Linux kernel header
179+
// include/uapi/rdma/ib_user_ioctl_verbs.h
180+
enum rdma_driver_id {
181+
RDMA_DRIVER_UNKNOWN,
182+
RDMA_DRIVER_MLX5,
183+
RDMA_DRIVER_MLX4,
184+
RDMA_DRIVER_CXGB3,
185+
RDMA_DRIVER_CXGB4,
186+
RDMA_DRIVER_MTHCA,
187+
RDMA_DRIVER_BNXT_RE,
188+
RDMA_DRIVER_OCRDMA,
189+
RDMA_DRIVER_NES,
190+
RDMA_DRIVER_I40IW,
191+
RDMA_DRIVER_IRDMA = RDMA_DRIVER_I40IW,
192+
RDMA_DRIVER_VMW_PVRDMA,
193+
RDMA_DRIVER_QEDR,
194+
RDMA_DRIVER_HNS,
195+
RDMA_DRIVER_USNIC,
196+
RDMA_DRIVER_RXE,
197+
RDMA_DRIVER_HFI1,
198+
RDMA_DRIVER_QIB,
199+
RDMA_DRIVER_EFA,
200+
RDMA_DRIVER_SIW,
201+
RDMA_DRIVER_MANA, //<- Note MANA added as last member of enum
202+
};
203+
204+
// Example mismatched rdma-core ioctl verbs header
205+
// on github: kernel-headers/rdma/ib_user_ioctl_verbs.h
206+
// or in release tar.gz: include/rdma/ib_user_ioctl_verbs.h
207+
enum rdma_driver_id {
208+
RDMA_DRIVER_UNKNOWN,
209+
RDMA_DRIVER_MLX5,
210+
RDMA_DRIVER_MLX4,
211+
RDMA_DRIVER_CXGB3,
212+
RDMA_DRIVER_CXGB4,
213+
RDMA_DRIVER_MTHCA,
214+
RDMA_DRIVER_BNXT_RE,
215+
RDMA_DRIVER_OCRDMA,
216+
RDMA_DRIVER_NES,
217+
RDMA_DRIVER_I40IW,
218+
RDMA_DRIVER_IRDMA = RDMA_DRIVER_I40IW,
219+
RDMA_DRIVER_VMW_PVRDMA,
220+
RDMA_DRIVER_QEDR,
221+
RDMA_DRIVER_HNS,
222+
RDMA_DRIVER_USNIC,
223+
RDMA_DRIVER_RXE,
224+
RDMA_DRIVER_HFI1,
225+
RDMA_DRIVER_QIB,
226+
RDMA_DRIVER_EFA,
227+
RDMA_DRIVER_SIW,
228+
RDMA_DRIVER_ERDMA, // <- This upstream has two additional providers
229+
RDMA_DRIVER_MANA, // <- So MANA's ID in the enum does not match
230+
};
231+
```
232+
233+
This mismatch will result in the MANA provider code failing to load. If you use `gdb` to trace the execution in this example you will find the provider for edrma is loaded instead. Either removing the erdma provider from the rdma-core source or forcing the ordering of the provider IDs will allow the MANA provider to load correctly.

0 commit comments

Comments
 (0)