File tree Expand file tree Collapse file tree 2 files changed +8
-7
lines changed Expand file tree Collapse file tree 2 files changed +8
-7
lines changed Original file line number Diff line number Diff line change 9393# We create a :py:class:`pylops.waveeqprocessing.LSM` at each rank and then push them
9494# into a :py:class:`pylops_mpi.basicoperators.MPIVStack` to perform a matrix-vector
9595# product with the broadcasted reflectivity at every location on the subsurface.
96- # Also, we must pass `nccl_comm` to `refl` in order to use NCCL for communications.
97- # Noted that we allocate some arrays (wav, lsm.Demop.trav_srcs, and lsm.Demop.trav.recs)
98- # to GPU upfront. Because we want a fair performace comparison, we avoid having
99- # LSM internally copying arrays.
96+ # Note that we must use :code`engine="cuda"` and move the wavelet wav to the GPU prior to creating the operator.
97+ # Moreover, we allocate the traveltime tables (:code`lsm.Demop.trav_srcs`, and :code`lsm.Demop.trav_recs`)
98+ # to the GPU prior to applying the operator to avoid incurring in the penalty of performing
99+ # host-to-device memory copies every time the operator is applied. Moreover, we must pass :code`nccl_comm`
100+ # to the DistributedArray constructor used to create :code`refl_dist` in order to use NCCL for communications.
100101
101102# Wavelet
102103nt = 651
139140
140141###############################################################################
141142# We calculate the inverse using the :py:func:`pylops_mpi.optimization.basic.cgls`
142- # solver. Here, we pass the `nccl_comm` to `x0` to use NCCL as a communicator.
143+ # solver. Here, we pass the :code: `nccl_comm` to :code: `x0` to use NCCL as a communicator.
143144# In this particular case, the local computation will be done in GPU.
144145# Collective communication calls will be carried through NCCL GPU-to-GPU.
145146
Original file line number Diff line number Diff line change 3232
3333###############################################################################
3434# Let's start by defining all the parameters required by the
35- # :py:func :`pylops.waveeqprocessing.MPIMDC` operator.
35+ # :py:class :`pylops.waveeqprocessing.MPIMDC` operator.
3636# Note that this section is exactly the same as the one in the MPI example as
3737# we will keep using MPI for transfering metadata (i.e., shapes, dims, etc.)
3838
106106# And now, we define the distributed operator MPIMDC and model as well as compute the data.
107107# Both the model and data have to live in GPU. We also define the DistributedArray `m`
108108# with `nccl_comm`` and engine="cupy" to use NCCL for communications (the data `d` will be set the same).
109- # Noted that fftengine must be set to "numpy" in MDCop operator when running with CuPy
109+ # Note that fftengine must be set to "numpy" in MDCop operator when running with CuPy
110110
111111# Move operator kernel to GPU
112112G = cp .asarray (G )
You can’t perform that action at this time.
0 commit comments