UCX_IB_MLX5_DEVX mystery

I found, by accident, that setting `UCX_IB_MLX5_DEVX=no` gives ~2x perf improvement for one customer code on Azure Genoa hbv4 VMs ( https://learn.microsoft.com/en-us/azure/virtual-machines/hbv4-series-overview). Each VM has a single 400 Gb/s Mellanox ConnectX-7 NDR nic.
So far none of the other codes I've tried show any sensitivity to this setting.

Very similar ~2x perf boost is seen on this code on bare metal Genoa IB cluster (also 400 Gb/sec (4X NDR).

Howver, on the same bare metal cluster, Turin nodes show no sensitivity to this setting.

The code has mostly allreduce and alltoallv. Alltoallv calls are "sparse" -- 4 ranks are either sending to all other ranks, or receiving from all other ranks. The typical scale of my jobs is ~1k ranks, no OpenMP.

Recently, I observed another behaviour affected by setting `UCX_IB_MLX5_DEVX=no` .
This is on NOAA HAFS code (https://github.com/HAFS-community/HAFS).
I only tried this code on the Azure hbv4 platform.
Using recent master branches of pmix, prrte and openmpi.
I start the job with:

```
mpiexec --display-map --bind-to core --map-by ppr:88:node:pe=2  \
-n 3072 /usr/bin/env OMP_NUM_THREADS=2 ${exe} :  \
-n   32 /usr/bin/env OMP_NUM_THREADS=2 ${exe} : \
-n  240 /usr/bin/env OMP_NUM_THREADS=2 ${exe}
```

The jobs hangs somewhere in ucx layers.
However, if I set `UCX_IB_MLX5_DEVX=no`, the job runs to completion, with performance comparable to intel mpi.

If I try to use 4 omp threads/rank, i.e:
```
mpiexec --display-map --bind-to core--map-by ppr:44:node:pe=4 \
-n 3072 /usr/bin/env OMP_NUM_THREADS=4 ${exe} : \
-n   32 /usr/bin/env OMP_NUM_THREADS=4 ${exe} : \
-n  240 /usr/bin/env OMP_NUM_THREADS=4 ${exe}
```
the job hangs regardless of whether `UCX_IB_MLX5_DEVX` is set to `yes` or `no`.

I'm not sure what to make of these observations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCX_IB_MLX5_DEVX mystery #13688

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UCX_IB_MLX5_DEVX mystery #13688

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions