Skip to content

Commit 2d749a2

Browse files
authored
feat: ToCupy operator (#622)
* feat: ToCupy operator * minor: fix typo * test: added tests for ToCupy * minor: small change to doc * doc: added graphics of cpu-gpu scenarios
1 parent 317a804 commit 2d749a2

File tree

13 files changed

+369
-72
lines changed

13 files changed

+369
-72
lines changed
46.2 KB
Loading
337 KB
Loading
262 KB
Loading

docs/source/api/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ Basic operators
6262
Real
6363
Imag
6464
Conj
65+
ToCupy
66+
6567

6668
Smoothing and derivatives
6769
~~~~~~~~~~~~~~~~~~~~~~~~~

docs/source/gpu.rst

Lines changed: 169 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,172 @@ provide data vectors to the solvers, e.g., when using
2929
For JAX, apart from following the same procedure described for CuPy, the PyLops operator must
3030
be also wrapped into a :class:`pylops.JaxOperator`.
3131

32+
See below for a comphrensive list of supported operators and additional functionalities for both the
33+
``cupy`` and ``jax`` backends.
34+
35+
36+
Examples
37+
--------
38+
39+
Let's now briefly look at some use cases.
40+
41+
End-to-end GPU powered inverse problems
42+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
43+
44+
First we consider the most common scenario when both the model and data
45+
vectors fit onto the GPU memory. We can therefore simply replace all our
46+
``numpy`` arrays with ``cupy`` arrays and solve the inverse problem of
47+
interest end-to-end on the GPU.
48+
49+
.. image:: _static/cupy_diagram.png
50+
:width: 600
51+
:align: center
52+
53+
Let's first write a code snippet using ``numpy`` arrays, which PyLops
54+
will run on your CPU:
55+
56+
.. code-block:: python
57+
58+
ny, nx = 400, 400
59+
G = np.random.normal(0, 1, (ny, nx)).astype(np.float32)
60+
x = np.ones(nx, dtype=np.float32)
61+
62+
# Create operator
63+
Gop = MatrixMult(G, dtype='float32')
64+
65+
# Create data and invert
66+
y = Gop @ x
67+
xest = Gop / y
68+
69+
Now we write a code snippet using ``cupy`` arrays, which PyLops will run on
70+
your GPU:
71+
72+
.. code-block:: python
73+
74+
ny, nx = 400, 400
75+
G = cp.random.normal(0, 1, (ny, nx)).astype(np.float32)
76+
x = cp.ones(nx, dtype=np.float32)
77+
78+
# Create operator
79+
Gop = MatrixMult(G, dtype='float32')
80+
81+
# Create data and invert
82+
y = Gop @ x
83+
xest = Gop / y
84+
85+
The code is almost unchanged apart from the fact that we now use ``cupy`` arrays,
86+
PyLops will figure this out.
87+
88+
Similarly, we write a code snippet using ``jax`` arrays which PyLops will run on
89+
your GPU/TPU:
90+
91+
.. code-block:: python
92+
93+
ny, nx = 400, 400
94+
G = jnp.array(np.random.normal(0, 1, (ny, nx)).astype(np.float32))
95+
x = jnp.ones(nx, dtype=np.float32)
96+
97+
# Create operator
98+
Gop = JaxOperator(MatrixMult(G, dtype='float32'))
99+
100+
# Create data and invert
101+
y = Gop @ x
102+
xest = Gop / y
103+
104+
# Adjoint via AD
105+
xadj = Gop.rmatvecad(x, y)
106+
107+
Again, the code is almost unchanged apart from the fact that we now use ``jax`` arrays.
108+
109+
110+
Mixed CPU-GPU powered inverse problems
111+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
112+
113+
Let us now consider a more intricate scenario where we have access to
114+
a GPU-powered operator, however the model and/or data vectors are too large
115+
to fit onto the GPU memory (or VRAM).
116+
117+
For the sake of clarity, we consider a problem where
118+
the operator can be written as a :class:`pylops.BlockDiag` of
119+
PyLops operators. Note how, by simply sandwiching any of the GPU-powered
120+
operator within two :class:`pylops.ToCupy` operators, we are
121+
able to tell PyLops to transfer to the GPU only the part of the model vector
122+
required by a given operator and transfer back the output to the CPU before
123+
forming the combine output vector (i.e., the output vector of the
124+
:class:`pylops.BlockDiag`).
125+
126+
.. image:: _static/numpy_cupy_bd_diagram.png
127+
:width: 1000
128+
:align: center
129+
130+
.. code-block:: python
131+
132+
nops, n = 5, 4
133+
Ms = [np.diag((i + 1) * np.ones(n, dtype=dtype)) \
134+
for i in range(nops)]
135+
Ms = [M.T @ M for M in Ms]
136+
137+
# Create operator
138+
Mops = []
139+
for iop in range(nops):
140+
Mop = MatrixMult(cp.asarray(Ms[iop], dtype=dtype))
141+
Top = ToCupy(Mop.dims, dtype=dtype)
142+
Top1 = ToCupy(Mop.dimsd, dtype=dtype)
143+
Mop = Top1.H @ Mop @ Top
144+
Mops.append(Mop)
145+
Mops = BlockDiag(Mops, forceflat=True)
146+
147+
# Create data and invert
148+
x = np.ones(n * nops, dtype=dtype)
149+
y = Mops @ x.ravel()
150+
xest = Mops / y
151+
152+
153+
Finally, let us consider a problem where
154+
the operator can be written as a :class:`pylops.VStack` of
155+
PyLops operators and the model vector can be fully transferred to the GPU.
156+
We can use again the :class:`pylops.ToCupy` operator, however this
157+
time we will only use it to move the output of each operator to the CPU.
158+
Since we are now in a special scenario, where the input of the overall
159+
operator sits on the GPU and the output on the
160+
CPU, we need to inform the :class:`pylops.VStack` operator about this.
161+
This can be easily done using the additional ``inoutengine`` parameter. Let's
162+
see this with an example.
163+
164+
.. image:: _static/numpy_cupy_vs_diagram.png
165+
:width: 1000
166+
:align: center
167+
168+
.. code-block:: python
169+
170+
nops, n, m = 3, 4, 5
171+
Ms = [np.random.normal(0, 1, (n, m)) for _ in range(nops)]
172+
173+
# Create operator
174+
Mops = []
175+
for iop in range(nops):
176+
Mop = MatrixMult(cp.asarray(Ms[iop]), dtype=dtype)
177+
Top1 = ToCupy(Mop.dimsd, dtype=dtype)
178+
Mop = Top1.H @ Mop
179+
Mops.append(Mop)
180+
Mops = VStack(Mops, inoutengine=("numpy", "cupy"))
181+
182+
# Create data and invert
183+
x = cp.ones(m, dtype=dtype)
184+
y = Mops @ x.ravel()
185+
xest = pylops_cgls(Mops, y, x0=cp.zeros_like(x))[0]
186+
187+
These features are currently not available for ``jax`` arrays.
188+
189+
190+
.. note::
191+
192+
More examples for the CuPy and JAX backends be found at `link1 <https://github.com/PyLops/pylops_notebooks/tree/master/developement-cupy>`_
193+
and `link2 <https://github.com/PyLops/pylops_notebooks/tree/master/developement/Basic_JAX.ipynb>`_.
194+
195+
196+
Supported Operators
197+
-------------------
32198

33199
In the following, we provide a list of methods in :class:`pylops.LinearOperator` with their current status (available on CPU,
34200
GPU with CuPy, and GPU with JAX):
@@ -195,6 +361,7 @@ Smoothing and derivatives:
195361
- |:white_check_mark:|
196362
- |:white_check_mark:|
197363

364+
198365
Signal processing:
199366

200367
.. list-table::
@@ -322,6 +489,7 @@ Signal processing:
322489
- |:white_check_mark:|
323490
- |:white_check_mark:|
324491

492+
325493
Wave-Equation processing
326494

327495
.. list-table::
@@ -369,6 +537,7 @@ Wave-Equation processing
369537
- |:red_circle:|
370538
- |:red_circle:|
371539

540+
372541
Geophysical subsurface characterization:
373542

374543
.. list-table::
@@ -407,60 +576,3 @@ Geophysical subsurface characterization:
407576
operator currently works only with ``explicit=True`` due to the same issue as
408577
in point 1 for the :class:`pylops.signalprocessing.Convolve1D` operator employed
409578
when ``explicit=False``.
410-
411-
412-
Example
413-
-------
414-
415-
Finally, let's briefly look at an example. First we write a code snippet using
416-
``numpy`` arrays which PyLops will run on your CPU:
417-
418-
.. code-block:: python
419-
420-
ny, nx = 400, 400
421-
G = np.random.normal(0, 1, (ny, nx)).astype(np.float32)
422-
x = np.ones(nx, dtype=np.float32)
423-
424-
Gop = MatrixMult(G, dtype='float32')
425-
y = Gop * x
426-
xest = Gop / y
427-
428-
Now we write a code snippet using ``cupy`` arrays which PyLops will run on
429-
your GPU:
430-
431-
.. code-block:: python
432-
433-
ny, nx = 400, 400
434-
G = cp.random.normal(0, 1, (ny, nx)).astype(np.float32)
435-
x = cp.ones(nx, dtype=np.float32)
436-
437-
Gop = MatrixMult(G, dtype='float32')
438-
y = Gop * x
439-
xest = Gop / y
440-
441-
The code is almost unchanged apart from the fact that we now use ``cupy`` arrays,
442-
PyLops will figure this out.
443-
444-
Similarly, we write a code snippet using ``jax`` arrays which PyLops will run on
445-
your GPU/TPU:
446-
447-
.. code-block:: python
448-
449-
ny, nx = 400, 400
450-
G = jnp.array(np.random.normal(0, 1, (ny, nx)).astype(np.float32))
451-
x = jnp.ones(nx, dtype=np.float32)
452-
453-
Gop = JaxOperator(MatrixMult(G, dtype='float32'))
454-
y = Gop * x
455-
xest = Gop / y
456-
457-
# Adjoint via AD
458-
xadj = Gop.rmatvecad(x, y)
459-
460-
461-
Again, the code is almost unchanged apart from the fact that we now use ``jax`` arrays,
462-
463-
.. note::
464-
465-
More examples for the CuPy and JAX backends be found `here <https://github.com/PyLops/pylops_notebooks/tree/master/developement-cupy>`__
466-
and `here <https://github.com/PyLops/pylops_notebooks/tree/master/developement/Basic_JAX.ipynb>`__.

pylops/basicoperators/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
Gradient Gradient.
3939
FirstDirectionalDerivative First Directional derivative.
4040
SecondDirectionalDerivative Second Directional derivative.
41+
ToCupy Convert to CuPy.
4142
"""
4243

4344
from .functionoperator import *
@@ -72,6 +73,8 @@
7273
from .laplacian import *
7374
from .gradient import *
7475
from .directionalderivative import *
76+
from .tocupy import *
77+
7578

7679
__all__ = [
7780
"FunctionOperator",
@@ -107,4 +110,5 @@
107110
"Gradient",
108111
"FirstDirectionalDerivative",
109112
"SecondDirectionalDerivative",
113+
"ToCupy",
110114
]

pylops/basicoperators/blockdiag.py

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
from pylops import LinearOperator
2323
from pylops.basicoperators import MatrixMult
24-
from pylops.utils.backend import get_array_module, inplace_set
24+
from pylops.utils.backend import get_array_module, get_module, inplace_set
2525
from pylops.utils.typing import DTypeLike, NDArray
2626

2727

@@ -48,6 +48,12 @@ class BlockDiag(LinearOperator):
4848
.. versionadded:: 2.2.0
4949
5050
Force an array to be flattened after matvec and rmatvec.
51+
inoutengine : :obj:`tuple`, optional
52+
.. versionadded:: 2.4.0
53+
54+
Type of output vectors of `matvec` and `rmatvec. If ``None``, this is
55+
inferred directly from the input vectors. Note that this is ignored
56+
if ``nproc>1``.
5157
dtype : :obj:`str`, optional
5258
Type of elements in input array.
5359
@@ -113,6 +119,7 @@ def __init__(
113119
ops: Sequence[LinearOperator],
114120
nproc: int = 1,
115121
forceflat: bool = None,
122+
inoutengine: Optional[tuple] = None,
116123
dtype: Optional[DTypeLike] = None,
117124
) -> None:
118125
self.ops = ops
@@ -149,6 +156,7 @@ def __init__(
149156
if self.nproc > 1:
150157
self.pool = mp.Pool(processes=nproc)
151158

159+
self.inoutengine = inoutengine
152160
dtype = _get_dtype(ops) if dtype is None else np.dtype(dtype)
153161
clinear = all([getattr(oper, "clinear", True) for oper in self.ops])
154162
super().__init__(
@@ -172,7 +180,11 @@ def nproc(self, nprocnew: int) -> None:
172180
self._nproc = nprocnew
173181

174182
def _matvec_serial(self, x: NDArray) -> NDArray:
175-
ncp = get_array_module(x)
183+
ncp = (
184+
get_array_module(x)
185+
if self.inoutengine is None
186+
else get_module(self.inoutengine[0])
187+
)
176188
y = ncp.zeros(self.nops, dtype=self.dtype)
177189
for iop, oper in enumerate(self.ops):
178190
y = inplace_set(
@@ -183,7 +195,11 @@ def _matvec_serial(self, x: NDArray) -> NDArray:
183195
return y
184196

185197
def _rmatvec_serial(self, x: NDArray) -> NDArray:
186-
ncp = get_array_module(x)
198+
ncp = (
199+
get_array_module(x)
200+
if self.inoutengine is None
201+
else get_module(self.inoutengine[1])
202+
)
187203
y = ncp.zeros(self.mops, dtype=self.dtype)
188204
for iop, oper in enumerate(self.ops):
189205
y = inplace_set(

0 commit comments

Comments
 (0)