Skip to content

Commit 7cca83b

Browse files
first commit
1 parent bd148f5 commit 7cca83b

File tree

5 files changed

+156
-0
lines changed

5 files changed

+156
-0
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Build a CUDA-aware version of OpenMPI
2+
3+
Open MPI is an open source Message Passing Interface (MPI) implementation that is heavily used on parallel computing architectures. For HPC workloads, with a lot of traffic between the CPU and the GPU, marginal gains can be obtained by making Open MPI CUDA-aware.
4+
5+
## Prerequisites
6+
7+
In this example, we are using a type VM.GPU.A100.1 instance, a virtual machine featuring a NVIDIA A100 80 GB GPU and a standard Ubuntu 22.04 image. On this instance, we will install:
8+
* NVIDIA drivers
9+
* CUDA Container toolkit
10+
* GDRCOPY
11+
* UCX
12+
* Open MPI
13+
14+
## Configuration walkthrough
15+
16+
For the sake of simplicity, installation scripts can be found in the assets > scripts folder.
17+
18+
### Installing NVIDIA drivers and CUDA
19+
20+
The first is to install the NVIDIA drivers using the `ubuntu-drivers-common` package:
21+
```
22+
sudo apt-get install -y ubuntu-drivers-common
23+
```
24+
Available drivers can be found using the `sudo ubuntu-drivers --gpgpu list` command. If you want to install the 535 server driver version, run:
25+
```
26+
sudo ubuntu-drivers --gpgpu install nvidia:535-server
27+
```
28+
The `nvidia-smi` command requires the installation of the additional package:
29+
```
30+
sudo apt-get install -y nnvidia-utils-535-server
31+
```
32+
CUDA and its compiler `nvcc` can be installed with the following commands:
33+
```
34+
# Remove outdated signing key
35+
sudo apt-key del 7fa2af80
36+
37+
# Network repository installation
38+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
39+
sudo dpkg -i cuda-keyring_1.1-1_all.deb
40+
41+
# Update apt repository cache
42+
sudo apt-get update
43+
44+
# install CUDA SDK
45+
sudo apt-get install cuda-toolkit
46+
sudo apt-get install nvidia-gds # to include all GDS (GPUDirect Storage) packages
47+
48+
# Reboot the system
49+
sudo reboot
50+
```
51+
Verify that NVCC is available with `nvcc --version`. One can also verify that cuda has been correctly installed and added to your path with `echo $PATH`.
52+
53+
### Installing GDRCopy
54+
55+
This step is pretty straightforward. One can simply download the project from the official repo `git clone https://github.com/NVIDIA/gdrcopy.git` and then use the following commands to build the packages:
56+
```
57+
cd gdrcopy
58+
sudo apt-get install -y nvidia-dkms-535-server
59+
sudo apt-get install -y build-essential devscripts debhelper fakeroot pkg-config dkms
60+
cd packages
61+
CUDA=/usr/local/cuda-12.8 ./build-deb-packages.sh
62+
sudo dpkg -i gdrdrv-dkms_2.5-1_amd64.Ubuntu22_04.deb
63+
sudo dpkg -i libgdrapi_2.5-1_amd64.Ubuntu22_04.deb
64+
sudo dpkg -i gdrcopy-tests_2.5-1_amd64.Ubuntu22_04+cuda12.8.deb
65+
sudo dpkg -i gdrcopy_2.5-1_amd64.Ubuntu22_04.deb
66+
```
67+
68+
### Building UCX with GDRCopy and CUDA support
69+
70+
Same as previous step, one can download the project from the official repo `git clone https://github.com/openucx/ucx.git` and then build it:
71+
```
72+
cd ucx
73+
./configure --prefix=/usr/local/ucx --with-cuda=/usr/local/cuda --with-gdrcopy=/usr
74+
make -j8 install
75+
```
76+
77+
### Building Open MPI with CUDA and UCX
78+
79+
To install Open MPI, download the package from the [download page](https://www.open-mpi.org/software/ompi/v4.1/) and run the following commands:
80+
```
81+
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.8.tar.bz2
82+
83+
tar xf openmpi-4.1.8.tar.bz2
84+
cd openmpi-4.1.8
85+
86+
# Configure with CUDA and UCX support
87+
./configure --prefix=/opt/openmpi --with-cuda=/usr/local/cuda --with-ucx=/usr/local/ucx
88+
89+
# Build OpenMPI on 8 parallel threads
90+
make -j 8 all
91+
sudo make install
92+
```
93+
If several Open MPI implementations are installed on the same machine, make sure to verify which one is used by default (if added to the `$PATH` environment variable) by the `mpirun` command:
94+
```
95+
which mpirun
96+
```
97+
To make sure that the custom one is used, call `mpirun` with its full path `/opt/openmpi/bin/mpirun`.
98+
99+
100+
## Sources
101+
102+
Here are useful links:
103+
* [CUDA installation Guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#ubuntu)
104+
* [Building CUDA-aware Open MPI](https://www.open-mpi.org/faq/?category=buildcuda)
105+
* [GDRCopy GitHUb repo](https://github.com/NVIDIA/gdrcopy)
106+
* [UCX GitHub repo](https://github.com/openucx/ucx)
107+
* [Open MPI download](https://www.open-mpi.org/software/ompi/v4.1/)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
3+
# Remove outdated signing key
4+
sudo apt-key del 7fa2af80
5+
6+
# Network installation of the repository
7+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
8+
sudo dpkg -i cuda-keyring_1.1-1_all.deb
9+
10+
# Update apt repository cache
11+
sudo apt-get update
12+
13+
# install CUDA SDK
14+
sudo apt-get install cuda-toolkit
15+
sudo apt-get install nvidia-gds # to include all GDS (GPUDirect Storage) packages
16+
17+
# Reboot the system
18+
sudo reboot
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/bin/bash
2+
3+
git clone https://github.com/NVIDIA/gdrcopy.git
4+
cd gdrcopy
5+
sudo apt-get install -y nvidia-dkms-535-server
6+
sudo apt-get install -y build-essential devscripts debhelper fakeroot pkg-config dkms
7+
cd packages
8+
CUDA=/usr/local/cuda-12.8 ./build-deb-packages.sh
9+
sudo dpkg -i gdrdrv-dkms_2.5-1_amd64.Ubuntu22_04.deb
10+
sudo dpkg -i libgdrapi_2.5-1_amd64.Ubuntu22_04.deb
11+
sudo dpkg -i gdrcopy-tests_2.5-1_amd64.Ubuntu22_04+cuda12.8.deb
12+
sudo dpkg -i gdrcopy_2.5-1_amd64.Ubuntu22_04.deb
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
3+
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.8.tar.bz2
4+
5+
tar xf openmpi-4.1.8.tar.bz2
6+
cd openmpi-4.1.8
7+
8+
# Configure with CUDA and UCX support
9+
./configure --prefix=/opt/openmpi --with-cuda=/usr/local/cuda --with-ucx=/usr/local/ucx
10+
11+
# Build OpenMPI on 8 parallel threads
12+
make -j 8 all
13+
sudo make install
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
3+
git clone https://github.com/openucx/ucx.git
4+
cd ucx
5+
./configure --prefix=/usr/local/ucx --with-cuda=/usr/local/cuda --with-gdrcopy=/usr
6+
make -j8 install

0 commit comments

Comments
 (0)