Skip to content

Commit 8d654ea

Browse files
committed
update gpu.rst for simple nccl example and NCCL-supported op table
1 parent b192faf commit 8d654ea

File tree

1 file changed

+75
-2
lines changed

1 file changed

+75
-2
lines changed

docs/source/gpu.rst

Lines changed: 75 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,15 @@ can handle both scenarios. Note that, since most operators in PyLops-mpi are thi
2222
some of the operators in PyLops that lack a GPU implementation cannot be used also in PyLops-mpi when working with
2323
cupy arrays.
2424

25+
Moreover, PyLops-MPI also supports the Nvidia's Collective Communication Library (NCCL) for highly-optimized
26+
collective operations, such as AllReduce, AllGather, etc. This allows PyLops-MPI users to leverage the
27+
proprietary technology like NVLink that might be available in their infrastructure for fast data communication.
28+
29+
.. note::
30+
31+
Set environment variable ``NCCL_PYLOPS_MPI=0`` to explicitly force PyLops-MPI to ignore the ``NCCL`` backend.
32+
However, this is optional as users may opt-out for NCCL by skip passing `cupy.cuda.nccl.NcclCommunicator` to
33+
the :class:`pylops_mpi.StackedDistributedArray`
2534

2635
Example
2736
-------
@@ -79,7 +88,71 @@ your GPU:
7988
The code is almost unchanged apart from the fact that we now use ``cupy`` arrays,
8089
PyLops-mpi will figure this out!
8190

91+
If NCCL is available, ``cupy.cuda.nccl.NcclCommunicator`` can be initialized and passed to :class:`pylops_mpi.DistributedArray`
92+
as follows:
93+
94+
.. code-block:: python
95+
96+
from pylops_mpi.utils._nccl import initialize_nccl_comm
97+
98+
# Initilize NCCL Communicator
99+
nccl_comm = initialize_nccl_comm()
100+
101+
# Create distributed data (broadcast)
102+
nxl, nt = 20, 20
103+
dtype = np.float32
104+
d_dist = pylops_mpi.DistributedArray(global_shape=nxl * nt,
105+
base_comm_nccl=nccl_comm,
106+
partition=pylops_mpi.Partition.BROADCAST,
107+
engine="cupy", dtype=dtype)
108+
d_dist[:] = cp.ones(d_dist.local_shape, dtype=dtype)
109+
110+
# Create and apply VStack operator
111+
Sop = pylops.MatrixMult(cp.ones((nxl, nxl)), otherdims=(nt, ))
112+
HOp = pylops_mpi.MPIVStack(ops=[Sop, ])
113+
y_dist = HOp @ d_dist
114+
115+
Under the hood, PyLops-MPI use both MPI Communicator and NCCL Communicator to manage distributed operations. Each GPU is logically binded to
116+
one MPI process. Generally speaking, the small operation like array-related shape and size remain using MPI while the collective calls
117+
like AllReduce will be carried through NCCL.
118+
82119
.. note::
83120

84-
The CuPy backend is in active development, with many examples not yet in the docs.
85-
You can find many `other examples <https://github.com/PyLops/pylops_notebooks/tree/master/developement-mpi/Cupy_MPI>`_ from the `PyLops Notebooks repository <https://github.com/PyLops/pylops_notebooks>`_.
121+
The CuPy and NCCL backend is in active development, with many examples not yet in the docs.
122+
You can find many `other examples <https://github.com/PyLops/pylops_notebooks/tree/master/developement-mpi/Cupy_MPI>`_ from the `PyLops Notebooks repository <https://github.com/PyLops/pylops_notebooks>`_.
123+
124+
Supports for NCCL Backend
125+
-------------------
126+
In the following, we provide a list of modules in which operates on :class:`pylops_mpi.DistributedArray`
127+
that can leverage NCCL backend
128+
129+
.. list-table::
130+
:widths: 50 25
131+
:header-rows: 1
132+
133+
* - modules
134+
- NCCL supported
135+
* - :class:`pylops_mpi.DistributedArray`
136+
- /
137+
* - :class:`pylops_mpi.basicoperators.MPIVStack`
138+
- Ongoing
139+
* - :class:`pylops_mpi.basicoperators.MPIHStack`
140+
- Ongoing
141+
* - :class:`pylops_mpi.basicoperators.MPIBlockDiag`
142+
- Ongoing
143+
* - :class:`pylops_mpi.basicoperators.MPIGradient`
144+
- Ongoing
145+
* - :class:`pylops_mpi.basicoperators.MPIFirstDerivative`
146+
- Ongoing
147+
* - :class:`pylops_mpi.basicoperators.MPISecondDerivative`
148+
- Ongoing
149+
* - :class:`pylops_mpi.basicoperators.MPILaplacian`
150+
- Ongoing
151+
* - :class:`pylops_mpi.optimization.basic.cg`
152+
- Ongoing
153+
* - :class:`pylops_mpi.optimization.basic.cgls`
154+
- Ongoing
155+
* - ISTA Solver
156+
- Planned
157+
* - Complex Numeric Data Type for NCCL
158+
- Planned

0 commit comments

Comments
 (0)