doc: added details about cuda-aware mpi in doc

mrava87SW · mrava87SW · commit 02ba45bd8726 · 2025-10-15T21:50:19.000Z
diff --git a/docs/source/gpu.rst b/docs/source/gpu.rst
@@ -11,7 +11,7 @@ This library must be installed *before* PyLops-mpi is installed.
 
 .. note::
 
-   Set environment variable ``CUPY_PYLOPS=0`` to force PyLops to ignore the ``cupy`` backend.
+   Set the environment variable ``CUPY_PYLOPS=0`` to force PyLops to ignore the ``cupy`` backend.
    This can be also used if a previous (or faulty) version of ``cupy`` is installed in your system,
    otherwise you will get an error when importing PyLops.
 
@@ -22,6 +22,14 @@ can handle both scenarios. Note that, since most operators in PyLops-mpi are thi
 some of the operators in PyLops that lack a GPU implementation cannot be used also in PyLops-mpi when working with
 cupy arrays.
 
+.. note::
+
+   By default when using ``cupy`` arrays, PyLops-MPI will try to use methods in MPI4Py that communicate memory buffers.
+   However, this requires a CUDA-Aware MPI installation. If your MPI installation is not CUDA-Aware, set the 
+   environment variable ``PYLOPS_MPI_CUDA_AWARE=0`` to force PyLops-MPI to use methods in  MPI4Py that communicate
+   general Python objects (this will incur a loss of performance!).
+
+
 Moreover, PyLops-MPI also supports the Nvidia's Collective Communication Library (NCCL) for highly-optimized
 collective operations, such as AllReduce, AllGather, etc. This allows PyLops-MPI users to leverage the
 proprietary technology like NVLink that might be available in their infrastructure for fast data communication.
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -15,7 +15,13 @@ The minimal set of dependencies for the PyLops-MPI project is:
 * `MPI4py <https://mpi4py.readthedocs.io/en/stable/>`_
 * `PyLops <https://pylops.readthedocs.io/en/stable/>`_
 
-Additionally, to use the NCCL engine, the following additional 
+Additionally, to use the CUDA-aware MPI engine, the following additional 
+dependencies are required:
+
+* `CuPy <https://cupy.dev/>`_
+* CUDA-aware MPI
+
+Similarly, to use the NCCL engine, the following additional 
 dependencies are required:
 
 * `CuPy <https://cupy.dev/>`_
@@ -27,12 +33,18 @@ if this is not possible, some of the dependencies must be installed prior to ins
 
 Download and Install MPI
 ========================
-Visit the official MPI website to download an appropriate MPI implementation for your system.
-Follow the installation instructions provided by the MPI vendor.
+Visit the official website of your MPI vendor of choice to download an appropriate MPI 
+implementation for your system:
+
+* `Open MPI <https://docs.open-mpi.org/>`_
+* `MPICH <https://www.mpich.org/>`_
+* `Intel MPI <https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html>`_
+* ...
 
-* `Open MPI <https://www.open-mpi.org/software/ompi/v1.10/>`_
-* `MPICH <https://www.mpich.org/downloads/>`_
-* `Intel MPI <https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.10j8fx>`_
+Alternatively, the conda-forge community provides ready-to-use binary packages for four MPI implementations 
+(see `MPI4Py documentation <https://mpi4py.readthedocs.io/en/stable/install.html#conda-packages>`_ for more 
+details). In this case, you can defer the installation to the stage when the conda environment for your project 
+is created - see below for more details.
 
 Verify MPI Installation
 =======================
@@ -42,6 +54,17 @@ After installing MPI, verify its installation by opening a terminal and running
 
    >> mpiexec --version
 
+Install CUDA-Aware MPI (optional)
+=================================
+To be able to achieve the best performance when using PyLops-MPI with CuPy arrays, a CUDA-Aware version of 
+MPI must be installed.
+
+For `Open MPI`, the conda-forge package has built-in CUDA support, as long as a pre-installed CUDA is detected.
+Run the following `commands <https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/cuda.html#how-do-i-verify-that-open-mpi-has-been-built-with-cuda-support>`_
+for diagnostics.
+
+For the other MPI implementations, refer to their specific documentation.
+
 Install NCCL (optional)
 =======================
 To obtain highly-optimized performance on GPU clusters, PyLops-MPI also supports the Nvidia's collective communication calls
@@ -103,6 +126,15 @@ For a ``conda`` environment, run
 This will create and activate an environment called ``pylops_mpi``, with all 
 required and optional dependencies.
 
+If you want to also install MPI as part of the creation process of the conda environment,
+modify the ``environment-dev.yml`` file by adding ``openmpi``\``mpich`\``impi_rt``\``msmpi``
+just above ``mpi4py``. Note that only ``openmpi`` provides a CUDA-Aware MPI installation.
+
+If you want to leverage CUDA-Aware MPI but prefer to use another MPI installation, you must
+either switch to a `Pip`-based installation (see below), or move ``mpi4py`` into the ``pip``
+section of the ``environment-dev.yml`` file and export the variable ``MPICC`` pointing to
+the path of your CUDA-Aware MPI installation.
+
 If you want to enable `NCCL <https://developer.nvidia.com/nccl>`_ in PyLops-MPI, run this instead
 
 .. code-block:: bash