add nccl to README, installation guides, Makefile, and index

tharittk · tharittk · commit b192faf1f872 · 2025-05-29T21:58:11.000+07:00
diff --git a/Makefile b/Makefile
@@ -24,6 +24,10 @@ dev-install:
 	make pipcheck
 	$(PIP) install -r requirements-dev.txt && $(PIP) install -e .
 
+dev-install_nccl:
+	make pipcheck
+	$(PIP) install cupy-cuda12x nvidia-nccl-cu12 && $(PIP) install -r requirements-dev.txt && $(PIP) install -e .
+
 install_conda:
 	conda env create -f environment.yml && conda activate pylops_mpi && pip install .
 
diff --git a/README.md b/README.md
@@ -34,6 +34,8 @@ and running the following command:
       ```
       make install_conda
       ```
+      Optionally, if you work with multi-GPU environment and want to have Nvidia's collective communication calls
+      [(NCCL)](https://developer.nvidia.com/nccl>) enabled, please visit the [installation guide](https://pylops.github.io/pylops-mpi/installation.html) for further detail
    
 ## Run Pylops-MPI
 Once you have installed the prerequisites and pylops-mpi, you can run pylops-mpi using the `mpiexec` command. 
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -14,6 +14,10 @@ By integrating MPI (Message Passing Interface), PyLops-MPI optimizes the collabo
 computing nodes, enabling large and intricate tasks to be divided, solved, and aggregated in an efficient and
 parallelized manner.
 
+PyLops-MPI also supports the Nvidia's Collective Communication Library `(NCCL) <https://developer.nvidia.com/nccl>`_ for high-performance
+GPU-to-GPU communications.This PyLops-MPI's NCCL engine works congruently with MPI by delegating the GPU-to-GPU communication tasks to 
+highly-optimized NCCL, while leveraging MPI for CPU-side coordination and orchestration.
+
 Get started by :ref:`installing PyLops-MPI <Installation>` and following our quick tour.
 
 Terminology
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -45,6 +45,14 @@ Fork the `PyLops-MPI repository <https://github.com/PyLops/pylops-mpi>`_ and clo
 We recommend installing dependencies into a separate environment.
 For that end, we provide a `Makefile` with useful commands for setting up the environment.
 
+Enable Nvidia Collective Communication Library
+=======================================================
+To obtain highly-optimized performance on GPU clusters, PyLops-MPI also supports the Nvidia's collective communication calls
+`(NCCL) <https://developer.nvidia.com/nccl>`_. Two additional dependencies are required: CuPy and NCCL 
+
+* `CuPy with NCCL <https://docs.cupy.dev/en/stable/install.html>`_
+
+
 Step-by-step installation for users
 ***********************************
 
@@ -89,6 +97,12 @@ For a ``conda`` environment, run
 
 This will create and activate an environment called ``pylops_mpi``, with all required and optional dependencies.
 
+If you want to enable `NCCL <https://developer.nvidia.com/nccl>`_ in PyLops-MPI, run this instead
+
+.. code-block:: bash
+
+   >> make dev-install_conda_nccl
+
 Pip
 ---
 If you prefer a ``pip`` installation, we provide the following command
@@ -100,15 +114,22 @@ If you prefer a ``pip`` installation, we provide the following command
 Note that, differently from the  ``conda`` command, the above **will not** create a virtual environment.
 Make sure you create and activate your environment previously.
 
-Enable Nvidia Collective Communication Library (NCCL)
-=======================================================
-To obtain highly-optimized performance on GPU clusters, PyLops-MPI also supports the Nvidia's collective communication calls (NCCL).
-`NCCL <https://developer.nvidia.com/nccl>
+Simlarly, if you want to enable `NCCL <https://developer.nvidia.com/nccl>`_ but prefer using pip,
+you must first check CUDA version of your system:
+
+.. code-block:: bash
+
+   >> nvidia-smi
+
+The `Makefile` is pre-configured with CUDA 12.x. If you use this version, run
 
 .. code-block:: bash
 
-   >> make dev-install_conda_nc
+   >> make dev-install_nccl
 
+Otherwise, you can change the command in `Makefile` to appropriate CUDA version
+i.e., If you use CUDA 11.x, change ``cupy-cuda12x`` and ``nvidia-nccl-cu12`` to ``cupy-cuda11x`` and ``nvidia-nccl-cu11``  
+and run the command.
 
 Run tests
 =========