-
Notifications
You must be signed in to change notification settings - Fork 929
Description
Please submit all the information below so that we can understand the working environment that is the context for your question.
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
OpenMPI 5
Details of the problem
What is the recommended way for OMPI5 applications to assign cuda devices to ranks?
From the docs:
https://www.open-mpi.org/faq/?category=runcuda#mpi-cuda-dev-opa
In addition, each process of the application should select a specific GPU card to use before calling MPI_Init(), by using cudaChooseDevice(), cudaSetDevice() and similar.
But an example MPI application cannot get the rank until after MPI_Init() is called:
int main(int argc, char** argv) {
// Requires GPU to be set here
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// learns rank here
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
I didn't see any flags in mpirun to mask which gpu's are being exposed to which process, although this should be possible to do with a custom launch script.
Given these constraints, what is the golden path for a user developing a cuda aware OMPI application?