Skip to content

Commit 94406f7

Browse files
Merge pull request #1571 from oracle-devrel/cuda-aware-openmpi
Building a CUDA-aware Open MPI
2 parents 55054a7 + 4c60175 commit 94406f7

File tree

5 files changed

+183
-0
lines changed

5 files changed

+183
-0
lines changed
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Build a CUDA-aware version of OpenMPI
2+
3+
Open MPI is an open source Message Passing Interface (MPI) implementation that is heavily used on parallel computing architectures. For HPC workloads, with a lot of traffic between the CPU and the GPU, marginal gains can be obtained by making Open MPI CUDA-aware.
4+
5+
## Prerequisites
6+
7+
In this example, we are using a type VM.GPU.A100.80G.1 instance, a virtual machine featuring a NVIDIA A100 80 GB GPU and a standard Ubuntu 22.04 image. On this instance, we will install:
8+
* NVIDIA drivers
9+
* CUDA Container toolkit
10+
* GDRCOPY
11+
* UCX
12+
* Open MPI
13+
14+
## Configuration walkthrough
15+
16+
For the sake of simplicity, installation scripts can be found in the [scripts](assets/scripts) folder.
17+
18+
### Installing NVIDIA drivers and CUDA
19+
20+
The first is to install the NVIDIA drivers using the `ubuntu-drivers-common` package:
21+
```
22+
sudo apt-get install -y ubuntu-drivers-common
23+
```
24+
Available drivers can be found using the `sudo ubuntu-drivers --gpgpu list` command. If you want to install the 535 server driver version, run:
25+
```
26+
sudo ubuntu-drivers --gpgpu install nvidia:535-server
27+
```
28+
The `nvidia-smi` command requires the installation of the additional package:
29+
```
30+
sudo apt-get install -y nnvidia-utils-535-server
31+
```
32+
CUDA and its compiler `nvcc` can be installed with the following commands:
33+
```
34+
# Remove outdated signing key
35+
sudo apt-key del 7fa2af80
36+
37+
# Network repository installation
38+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
39+
sudo dpkg -i cuda-keyring_1.1-1_all.deb
40+
41+
# Update apt repository cache
42+
sudo apt-get update
43+
44+
# install CUDA SDK
45+
sudo apt-get install cuda-toolkit
46+
sudo apt-get install nvidia-gds # to include all GDS (GPUDirect Storage) packages
47+
48+
# Reboot the system
49+
sudo reboot
50+
```
51+
Verify that NVCC is available with `nvcc --version`. One can also verify that cuda has been correctly installed and added to your path with `echo $PATH`.
52+
53+
### Installing GDRCopy
54+
55+
This step is pretty straightforward. One can simply download the project from the official repo `git clone https://github.com/NVIDIA/gdrcopy.git` and then use the following commands to build the packages:
56+
```
57+
cd gdrcopy
58+
sudo apt-get install -y nvidia-dkms-535-server
59+
sudo apt-get install -y build-essential devscripts debhelper fakeroot pkg-config dkms
60+
cd packages
61+
CUDA=/usr/local/cuda-12.8 ./build-deb-packages.sh
62+
sudo dpkg -i gdrdrv-dkms_2.5-1_amd64.Ubuntu22_04.deb
63+
sudo dpkg -i libgdrapi_2.5-1_amd64.Ubuntu22_04.deb
64+
sudo dpkg -i gdrcopy-tests_2.5-1_amd64.Ubuntu22_04+cuda12.8.deb
65+
sudo dpkg -i gdrcopy_2.5-1_amd64.Ubuntu22_04.deb
66+
```
67+
68+
### Building UCX with GDRCopy and CUDA support
69+
70+
Same as previous step, one can download the project from the official repo `git clone https://github.com/openucx/ucx.git` and then build it:
71+
```
72+
cd ucx
73+
./configure --prefix=/usr/local/ucx --with-cuda=/usr/local/cuda --with-gdrcopy=/usr
74+
make -j8 install
75+
```
76+
Additionnally, one can check the UCX build info:
77+
```
78+
ubuntu@<hostname>:~$ /usr/local/ucx/bin/ucx_info -d | grep cuda
79+
# Memory domain: cuda_cpy
80+
# Component: cuda_cpy
81+
# memory types: host (reg), cuda (access,alloc,reg,detect), cuda-managed (access,alloc,reg,cache,detect)
82+
# Transport: cuda_copy
83+
# Device: cuda
84+
# Memory domain: cuda_ipc
85+
# Component: cuda_ipc
86+
# memory types: cuda (access,reg,cache)
87+
# Transport: cuda_ipc
88+
# Device: cuda
89+
# memory types: cuda (access,reg)
90+
# Device: cuda
91+
ubuntu@<hostname>:~$ /usr/local/ucx/bin/ucx_info -d | grep gdr_copy
92+
# Memory domain: gdr_copy
93+
# Component: gdr_copy
94+
# Transport: gdr_copy
95+
```
96+
97+
### Building Open MPI with CUDA and UCX
98+
99+
To install Open MPI, download the package from the [download page](https://www.open-mpi.org/software/ompi/v4.1/) and run the following commands:
100+
```
101+
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.8.tar.bz2
102+
103+
tar xf openmpi-4.1.8.tar.bz2
104+
cd openmpi-4.1.8
105+
106+
# Configure with CUDA and UCX support
107+
./configure --prefix=/opt/openmpi --with-cuda=/usr/local/cuda --with-ucx=/usr/local/ucx
108+
109+
# Build OpenMPI on 8 parallel threads
110+
make -j 8 all
111+
sudo make install
112+
```
113+
If several Open MPI implementations are installed on the same machine, make sure to verify which one is used by default (if added to the `$PATH` environment variable) by the `mpirun` command:
114+
```
115+
which mpirun
116+
```
117+
To make sure that the custom one is used, call `mpirun` with its full path `/opt/openmpi/bin/mpirun`.
118+
119+
One can verify that Open MPI has been successfully built with CUDA support running either one of the below commands:
120+
```
121+
ubuntu@<hostname>:~$ /opt/openmpi/bin/ompi_info | grep "MPI extensions"
122+
MPI extensions: affinity, cuda, pcollreq
123+
ubuntu@<hostname>:~$ /opt/openmpi/bin/ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
124+
mca:mpi:base:param:mpi_built_with_cuda_support:value:true
125+
```
126+
127+
## Sources
128+
129+
Here are useful links:
130+
* [CUDA installation Guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#ubuntu)
131+
* [Building CUDA-aware Open MPI](https://www.open-mpi.org/faq/?category=buildcuda)
132+
* [GDRCopy GitHUb repo](https://github.com/NVIDIA/gdrcopy)
133+
* [UCX GitHub repo](https://github.com/openucx/ucx)
134+
* [Open MPI download](https://www.open-mpi.org/software/ompi/v4.1/)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
3+
# Remove outdated signing key
4+
sudo apt-key del 7fa2af80
5+
6+
# Network installation of the repository
7+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
8+
sudo dpkg -i cuda-keyring_1.1-1_all.deb
9+
10+
# Update apt repository cache
11+
sudo apt-get update
12+
13+
# install CUDA SDK
14+
sudo apt-get install cuda-toolkit
15+
sudo apt-get install nvidia-gds # to include all GDS (GPUDirect Storage) packages
16+
17+
# Reboot the system
18+
sudo reboot
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/bin/bash
2+
3+
git clone https://github.com/NVIDIA/gdrcopy.git
4+
cd gdrcopy
5+
sudo apt-get install -y nvidia-dkms-535-server
6+
sudo apt-get install -y build-essential devscripts debhelper fakeroot pkg-config dkms
7+
cd packages
8+
CUDA=/usr/local/cuda-12.8 ./build-deb-packages.sh
9+
sudo dpkg -i gdrdrv-dkms_2.5-1_amd64.Ubuntu22_04.deb
10+
sudo dpkg -i libgdrapi_2.5-1_amd64.Ubuntu22_04.deb
11+
sudo dpkg -i gdrcopy-tests_2.5-1_amd64.Ubuntu22_04+cuda12.8.deb
12+
sudo dpkg -i gdrcopy_2.5-1_amd64.Ubuntu22_04.deb
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
3+
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.8.tar.bz2
4+
5+
tar xf openmpi-4.1.8.tar.bz2
6+
cd openmpi-4.1.8
7+
8+
# Configure with CUDA and UCX support
9+
./configure --prefix=/opt/openmpi --with-cuda=/usr/local/cuda --with-ucx=/usr/local/ucx
10+
11+
# Build OpenMPI on 8 parallel threads
12+
make -j 8 all
13+
sudo make install
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
3+
git clone https://github.com/openucx/ucx.git
4+
cd ucx
5+
./configure --prefix=/usr/local/ucx --with-cuda=/usr/local/cuda --with-gdrcopy=/usr
6+
make -j8 install

0 commit comments

Comments
 (0)