|
| 1 | +# Build a CUDA-aware version of OpenMPI |
| 2 | + |
| 3 | +Open MPI is an open source Message Passing Interface (MPI) implementation that is heavily used on parallel computing architectures. For HPC workloads, with a lot of traffic between the CPU and the GPU, marginal gains can be obtained by making Open MPI CUDA-aware. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +In this example, we are using a type VM.GPU.A100.80G.1 instance, a virtual machine featuring a NVIDIA A100 80 GB GPU and a standard Ubuntu 22.04 image. On this instance, we will install: |
| 8 | +* NVIDIA drivers |
| 9 | +* CUDA Container toolkit |
| 10 | +* GDRCOPY |
| 11 | +* UCX |
| 12 | +* Open MPI |
| 13 | + |
| 14 | +## Configuration walkthrough |
| 15 | + |
| 16 | +For the sake of simplicity, installation scripts can be found in the [scripts](assets/scripts) folder. |
| 17 | + |
| 18 | +### Installing NVIDIA drivers and CUDA |
| 19 | + |
| 20 | +The first is to install the NVIDIA drivers using the `ubuntu-drivers-common` package: |
| 21 | +``` |
| 22 | +sudo apt-get install -y ubuntu-drivers-common |
| 23 | +``` |
| 24 | +Available drivers can be found using the `sudo ubuntu-drivers --gpgpu list` command. If you want to install the 535 server driver version, run: |
| 25 | +``` |
| 26 | +sudo ubuntu-drivers --gpgpu install nvidia:535-server |
| 27 | +``` |
| 28 | +The `nvidia-smi` command requires the installation of the additional package: |
| 29 | +``` |
| 30 | +sudo apt-get install -y nnvidia-utils-535-server |
| 31 | +``` |
| 32 | +CUDA and its compiler `nvcc` can be installed with the following commands: |
| 33 | +``` |
| 34 | +# Remove outdated signing key |
| 35 | +sudo apt-key del 7fa2af80 |
| 36 | +
|
| 37 | +# Network repository installation |
| 38 | +wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb |
| 39 | +sudo dpkg -i cuda-keyring_1.1-1_all.deb |
| 40 | +
|
| 41 | +# Update apt repository cache |
| 42 | +sudo apt-get update |
| 43 | +
|
| 44 | +# install CUDA SDK |
| 45 | +sudo apt-get install cuda-toolkit |
| 46 | +sudo apt-get install nvidia-gds # to include all GDS (GPUDirect Storage) packages |
| 47 | +
|
| 48 | +# Reboot the system |
| 49 | +sudo reboot |
| 50 | +``` |
| 51 | +Verify that NVCC is available with `nvcc --version`. One can also verify that cuda has been correctly installed and added to your path with `echo $PATH`. |
| 52 | + |
| 53 | +### Installing GDRCopy |
| 54 | + |
| 55 | +This step is pretty straightforward. One can simply download the project from the official repo `git clone https://github.com/NVIDIA/gdrcopy.git` and then use the following commands to build the packages: |
| 56 | +``` |
| 57 | +cd gdrcopy |
| 58 | +sudo apt-get install -y nvidia-dkms-535-server |
| 59 | +sudo apt-get install -y build-essential devscripts debhelper fakeroot pkg-config dkms |
| 60 | +cd packages |
| 61 | +CUDA=/usr/local/cuda-12.8 ./build-deb-packages.sh |
| 62 | +sudo dpkg -i gdrdrv-dkms_2.5-1_amd64.Ubuntu22_04.deb |
| 63 | +sudo dpkg -i libgdrapi_2.5-1_amd64.Ubuntu22_04.deb |
| 64 | +sudo dpkg -i gdrcopy-tests_2.5-1_amd64.Ubuntu22_04+cuda12.8.deb |
| 65 | +sudo dpkg -i gdrcopy_2.5-1_amd64.Ubuntu22_04.deb |
| 66 | +``` |
| 67 | + |
| 68 | +### Building UCX with GDRCopy and CUDA support |
| 69 | + |
| 70 | +Same as previous step, one can download the project from the official repo `git clone https://github.com/openucx/ucx.git` and then build it: |
| 71 | +``` |
| 72 | +cd ucx |
| 73 | +./configure --prefix=/usr/local/ucx --with-cuda=/usr/local/cuda --with-gdrcopy=/usr |
| 74 | +make -j8 install |
| 75 | +``` |
| 76 | +Additionnally, one can check the UCX build info: |
| 77 | +``` |
| 78 | +ubuntu@<hostname>:~$ /usr/local/ucx/bin/ucx_info -d | grep cuda |
| 79 | +# Memory domain: cuda_cpy |
| 80 | +# Component: cuda_cpy |
| 81 | +# memory types: host (reg), cuda (access,alloc,reg,detect), cuda-managed (access,alloc,reg,cache,detect) |
| 82 | +# Transport: cuda_copy |
| 83 | +# Device: cuda |
| 84 | +# Memory domain: cuda_ipc |
| 85 | +# Component: cuda_ipc |
| 86 | +# memory types: cuda (access,reg,cache) |
| 87 | +# Transport: cuda_ipc |
| 88 | +# Device: cuda |
| 89 | +# memory types: cuda (access,reg) |
| 90 | +# Device: cuda |
| 91 | +ubuntu@<hostname>:~$ /usr/local/ucx/bin/ucx_info -d | grep gdr_copy |
| 92 | +# Memory domain: gdr_copy |
| 93 | +# Component: gdr_copy |
| 94 | +# Transport: gdr_copy |
| 95 | +``` |
| 96 | + |
| 97 | +### Building Open MPI with CUDA and UCX |
| 98 | + |
| 99 | +To install Open MPI, download the package from the [download page](https://www.open-mpi.org/software/ompi/v4.1/) and run the following commands: |
| 100 | +``` |
| 101 | +wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.8.tar.bz2 |
| 102 | +
|
| 103 | +tar xf openmpi-4.1.8.tar.bz2 |
| 104 | +cd openmpi-4.1.8 |
| 105 | +
|
| 106 | +# Configure with CUDA and UCX support |
| 107 | +./configure --prefix=/opt/openmpi --with-cuda=/usr/local/cuda --with-ucx=/usr/local/ucx |
| 108 | +
|
| 109 | +# Build OpenMPI on 8 parallel threads |
| 110 | +make -j 8 all |
| 111 | +sudo make install |
| 112 | +``` |
| 113 | +If several Open MPI implementations are installed on the same machine, make sure to verify which one is used by default (if added to the `$PATH` environment variable) by the `mpirun` command: |
| 114 | +``` |
| 115 | +which mpirun |
| 116 | +``` |
| 117 | +To make sure that the custom one is used, call `mpirun` with its full path `/opt/openmpi/bin/mpirun`. |
| 118 | + |
| 119 | +One can verify that Open MPI has been successfully built with CUDA support running either one of the below commands: |
| 120 | +``` |
| 121 | +ubuntu@<hostname>:~$ /opt/openmpi/bin/ompi_info | grep "MPI extensions" |
| 122 | + MPI extensions: affinity, cuda, pcollreq |
| 123 | +ubuntu@<hostname>:~$ /opt/openmpi/bin/ompi_info --parsable --all | grep mpi_built_with_cuda_support:value |
| 124 | +mca:mpi:base:param:mpi_built_with_cuda_support:value:true |
| 125 | +``` |
| 126 | + |
| 127 | +## Sources |
| 128 | + |
| 129 | +Here are useful links: |
| 130 | +* [CUDA installation Guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#ubuntu) |
| 131 | +* [Building CUDA-aware Open MPI](https://www.open-mpi.org/faq/?category=buildcuda) |
| 132 | +* [GDRCopy GitHUb repo](https://github.com/NVIDIA/gdrcopy) |
| 133 | +* [UCX GitHub repo](https://github.com/openucx/ucx) |
| 134 | +* [Open MPI download](https://www.open-mpi.org/software/ompi/v4.1/) |
0 commit comments