Skip to content

Commit fa98a08

Browse files
adding build verifications
1 parent 7cca83b commit fa98a08

File tree

1 file changed

+29
-2
lines changed
  • cloud-infrastructure/ai-infra-gpu/ai-infrastructure/cuda-aware-mpi

1 file changed

+29
-2
lines changed

cloud-infrastructure/ai-infra-gpu/ai-infrastructure/cuda-aware-mpi/README.md

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Open MPI is an open source Message Passing Interface (MPI) implementation that i
44

55
## Prerequisites
66

7-
In this example, we are using a type VM.GPU.A100.1 instance, a virtual machine featuring a NVIDIA A100 80 GB GPU and a standard Ubuntu 22.04 image. On this instance, we will install:
7+
In this example, we are using a type VM.GPU.A100.80G.1 instance, a virtual machine featuring a NVIDIA A100 80 GB GPU and a standard Ubuntu 22.04 image. On this instance, we will install:
88
* NVIDIA drivers
99
* CUDA Container toolkit
1010
* GDRCOPY
@@ -13,7 +13,7 @@ In this example, we are using a type VM.GPU.A100.1 instance, a virtual machine f
1313

1414
## Configuration walkthrough
1515

16-
For the sake of simplicity, installation scripts can be found in the assets > scripts folder.
16+
For the sake of simplicity, installation scripts can be found in the [scripts](cloud-infrastructure/ai-infra-gpu/ai-infrastructure/cuda-aware-mpi/assets/scripts) folder.
1717

1818
### Installing NVIDIA drivers and CUDA
1919

@@ -73,6 +73,26 @@ cd ucx
7373
./configure --prefix=/usr/local/ucx --with-cuda=/usr/local/cuda --with-gdrcopy=/usr
7474
make -j8 install
7575
```
76+
Additionnally, one can check the UCX build info:
77+
```
78+
ubuntu@<hostname>:~$ /usr/local/ucx/bin/ucx_info -d | grep cuda
79+
# Memory domain: cuda_cpy
80+
# Component: cuda_cpy
81+
# memory types: host (reg), cuda (access,alloc,reg,detect), cuda-managed (access,alloc,reg,cache,detect)
82+
# Transport: cuda_copy
83+
# Device: cuda
84+
# Memory domain: cuda_ipc
85+
# Component: cuda_ipc
86+
# memory types: cuda (access,reg,cache)
87+
# Transport: cuda_ipc
88+
# Device: cuda
89+
# memory types: cuda (access,reg)
90+
# Device: cuda
91+
ubuntu@<hostname>:~$ /usr/local/ucx/bin/ucx_info -d | grep gdr_copy
92+
# Memory domain: gdr_copy
93+
# Component: gdr_copy
94+
# Transport: gdr_copy
95+
```
7696

7797
### Building Open MPI with CUDA and UCX
7898

@@ -96,6 +116,13 @@ which mpirun
96116
```
97117
To make sure that the custom one is used, call `mpirun` with its full path `/opt/openmpi/bin/mpirun`.
98118

119+
One can verify that Open MPI has been successfully built with CUDA support running either one of the below commands:
120+
```
121+
ubuntu@<hostname>:~$ /opt/openmpi/bin/ompi_info | grep "MPI extensions"
122+
MPI extensions: affinity, cuda, pcollreq
123+
ubuntu@<hostname>:~$ /opt/openmpi/bin/ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
124+
mca:mpi:base:param:mpi_built_with_cuda_support:value:true
125+
```
99126

100127
## Sources
101128

0 commit comments

Comments
 (0)