-
Notifications
You must be signed in to change notification settings - Fork 929
Description
Starting a simple program with OpenMPI is very slow compared to MPICH (order of 100x slower). This is very anoying when developing and running tests.
$ time mpirun.openmpi -n 4 hostname
uqam-TECRA-A50-K
uqam-TECRA-A50-K
uqam-TECRA-A50-K
uqam-TECRA-A50-K
real 0m3,659s
user 0m0,020s
sys 0m0,374s
$ time mpirun.mpich -n 4 hostname
uqam-TECRA-A50-K
uqam-TECRA-A50-K
uqam-TECRA-A50-K
uqam-TECRA-A50-K
real 0m0,005s
user 0m0,001s
sys 0m0,006s
Using strace, I saw that the slowdown is related to PCI bus scanning, that seems to be done by the mca_ess_hnp component. Here is the stackstrace:
strace -k -o out -e trace=openat mpirun -n 4 hostname
...
> /usr/lib/x86_64-linux-gnu/libc.so.6(openat64+0x42) [0x11b2e2]
> /usr/lib/x86_64-linux-gnu/libhwloc.so.15.7.0(hwloc_linux_get_tid_last_cpu_location+0xe38d) [0x4558d]
> /usr/lib/x86_64-linux-gnu/libhwloc.so.15.7.0(hwloc_linux_get_tid_last_cpu_location+0x63eb) [0x3d5eb]
> /usr/lib/x86_64-linux-gnu/libhwloc.so.15.7.0(hwloc_linux_get_tid_last_cpu_location+0xda46) [0x44c46]
> /usr/lib/x86_64-linux-gnu/libhwloc.so.15.7.0(hwloc_topology_load+0xfed) [0x106cd]
> /usr/lib/x86_64-linux-gnu/libopen-pal.so.40.30.3(opal_hwloc_base_get_topology+0x12a6) [0x7b976]
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_hnp.so() [0x4c56]
> /usr/lib/x86_64-linux-gnu/libopen-rte.so.40.30.3(orte_init+0x2aa) [0x9815a]
> /usr/lib/x86_64-linux-gnu/libopen-rte.so.40.30.3(orte_submit_init+0x911) [0x420e1]
> /usr/bin/orterun() [0x11e8]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x8a) [0x2a1ca]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x2a28b]
> /usr/bin/orterun() [0x1415]
openat(-1, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 10
The scan seems to even repeat the device query multiple time:
$ grep "/sys/bus/pci/devices/0000:00:07.1/config" out
openat(-1, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 10
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 12
openat(-1, "/sys/bus/pci/devices/0000:00:07.1/config", O_RDONLY) = 15
I didn't find any MCA parameter to disable this component. I found that disabling some HWLOCK_COMPONENTS #11783 was actually reducing drastrically the startup time, especially pci and opencl. The linux component seems to be essential, otherwise the launch fails.
export HWLOC_COMPONENTS=-pci,-opencl,-x86,-no_os,-gl
I wonder what does actually the ess hnp component (I don't find any mention of that component in the documentation) and how to disable it using MCA parameter (I didn't find a way to disable it).
Thanks
Background information
OpenMPI version
Simply the standard package from Ubuntu 24.04
$ dpkg -l | grep openmpi-bin
ii openmpi-bin 4.1.6-7ubuntu2 amd64 high performance message passing library -- binaries
Please describe the system on which you are running
- Operating system/version: Ubuntu 24.04
- Computer hardware: generic x86_64 laptop 12th Gen Intel(R) Core(TM) i7-1270P
- Network type: none, slowdown occurs on localhost