Skip to content

Commit 51bc70c

Browse files
committed
Fix write-up as suggested by PR
1 parent 006ed1d commit 51bc70c

File tree

2 files changed

+8
-7
lines changed

2 files changed

+8
-7
lines changed

tutorials_nccl/lsm_nccl.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,11 @@
9393
# We create a :py:class:`pylops.waveeqprocessing.LSM` at each rank and then push them
9494
# into a :py:class:`pylops_mpi.basicoperators.MPIVStack` to perform a matrix-vector
9595
# product with the broadcasted reflectivity at every location on the subsurface.
96-
# Also, we must pass `nccl_comm` to `refl` in order to use NCCL for communications.
97-
# Noted that we allocate some arrays (wav, lsm.Demop.trav_srcs, and lsm.Demop.trav.recs)
98-
# to GPU upfront. Because we want a fair performace comparison, we avoid having
99-
# LSM internally copying arrays.
96+
# Note that we must use :code`engine="cuda"` and move the wavelet wav to the GPU prior to creating the operator.
97+
# Moreover, we allocate the traveltime tables (:code`lsm.Demop.trav_srcs`, and :code`lsm.Demop.trav_recs`)
98+
# to the GPU prior to applying the operator to avoid incurring in the penalty of performing
99+
# host-to-device memory copies every time the operator is applied. Moreover, we must pass :code`nccl_comm`
100+
# to the DistributedArray constructor used to create :code`refl_dist` in order to use NCCL for communications.
100101

101102
# Wavelet
102103
nt = 651
@@ -139,7 +140,7 @@
139140

140141
###############################################################################
141142
# We calculate the inverse using the :py:func:`pylops_mpi.optimization.basic.cgls`
142-
# solver. Here, we pass the `nccl_comm` to `x0` to use NCCL as a communicator.
143+
# solver. Here, we pass the :code:`nccl_comm` to :code:`x0` to use NCCL as a communicator.
143144
# In this particular case, the local computation will be done in GPU.
144145
# Collective communication calls will be carried through NCCL GPU-to-GPU.
145146

tutorials_nccl/mdd_nccl.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
###############################################################################
3434
# Let's start by defining all the parameters required by the
35-
# :py:func:`pylops.waveeqprocessing.MPIMDC` operator.
35+
# :py:class:`pylops.waveeqprocessing.MPIMDC` operator.
3636
# Note that this section is exactly the same as the one in the MPI example as
3737
# we will keep using MPI for transfering metadata (i.e., shapes, dims, etc.)
3838

@@ -106,7 +106,7 @@
106106
# And now, we define the distributed operator MPIMDC and model as well as compute the data.
107107
# Both the model and data have to live in GPU. We also define the DistributedArray `m`
108108
# with `nccl_comm`` and engine="cupy" to use NCCL for communications (the data `d` will be set the same).
109-
# Noted that fftengine must be set to "numpy" in MDCop operator when running with CuPy
109+
# Note that fftengine must be set to "numpy" in MDCop operator when running with CuPy
110110

111111
# Move operator kernel to GPU
112112
G = cp.asarray(G)

0 commit comments

Comments
 (0)