Skip to content

Commit b331dc9

Browse files
Add paper.md
1 parent e6b8690 commit b331dc9

File tree

1 file changed

+131
-0
lines changed

1 file changed

+131
-0
lines changed

joss/paper.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
title: 'PyLops-MPI - MPI Powered PyLops with mpi4py'
3+
tags:
4+
- Python
5+
- MPI
6+
- High Performance Computing
7+
authors:
8+
- name: Rohan Babbar
9+
orcid: 0000-0002-7203-7641
10+
affiliation: 1
11+
- name: Matteo Ravasi
12+
orcid: 0000-0003-0020-2721
13+
affiliation: 2
14+
- name: Yuxi Hong
15+
orcid: 0000-0002-0741-6602
16+
affiliation: 3
17+
affiliations:
18+
- name: Computer Science and Engineering, Cluster Innovation Center, University of Delhi, Delhi, India.
19+
index: 1
20+
- name: Earth Science and Engineering, Physical Sciences and Engineering (PSE), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
21+
index: 2
22+
- name: Postdoc Researcher (Computer Science), Lawrence Berkeley National Laboratory, Berkeley, California, United States of America.
23+
index: 3
24+
date: 24 September 2024
25+
bibliography: paper.bib
26+
---
27+
28+
# Summary
29+
30+
Large-scale linear operations and inverse problems are fundamental to numerous algorithms in fields such as image
31+
processing, geophysics, signal processing, and remote sensing. This paper presents PyLops-MPI, an extension of PyLops
32+
designed for distributed and parallel processing of large-scale challenges. PyLops-MPI facilitates forward and adjoint
33+
matrix-vector products, as well as inversion solvers, in a distributed framework. By using the Message Passing
34+
Interface (MPI), this framework effectively utilizes the computational power of multiple nodes or processors, enabling
35+
efficient solutions to large and complex inversion tasks in a parallelized manner.
36+
37+
# Statement of need
38+
39+
As scientific datasets grow and the demand for higher resolution increases, the need for distributed computing alongside
40+
matrix-free linear algebra becomes more critical. The size of models and datasets often exceeds the memory capacity of a
41+
single machine—making it difficult to perform computations efficiently and accurately. Many operators consist of
42+
multiple computational blocks that are resource-intensive—that can be effectively parallelized, further emphasizing the
43+
necessity for a distributed approach.
44+
45+
When addressing distributed inverse problems, we identify three distinct use cases that highlight the need for a
46+
flexible, scalable framework:
47+
48+
- **Fully Distributed Models and Data**: Both the model and data are distributed across nodes, with minimal
49+
communication during the modeling process. Communication
50+
occurs mainly during the solver stage when dot products or regularization, such as the Laplacian, are applied. This
51+
scenario is common
52+
in [Post-Stack seismic inversion](https://pylops.readthedocs.io/en/stable/tutorials/poststack.html#sphx-glr-tutorials-poststack-py),
53+
where each node handles a portion of the model and data, and communication only happens when adding spatial
54+
regularizers.
55+
56+
- **Distributed Data, Model Available on All Nodes**: In this case, data is distributed across nodes while the model is
57+
available at all nodes. Communication is required
58+
during the adjoint pass when models produced by each node need to be summed, and in the solver when performing dot
59+
products on the data. This pattern is typical in fields
60+
like [CT/MRI imaging](https://pylops.readthedocs.io/en/stable/tutorials/ctscan.html#sphx-glr-tutorials-ctscan-py)
61+
and [seismic least-squares migration](https://pylops.readthedocs.io/en/stable/tutorials/lsm.html#sphx-glr-tutorials-lsm-py).
62+
63+
- **Model and Data Available on All Nodes or Master**: Here, communication is confined to the operator, with the master
64+
node distributing parts of the model or data to
65+
workers. The workers then perform computations without requiring communication in the solver. An example of this is
66+
[MDC-based inversions](https://github.com/DIG-Kaust/TLR-MDC), which allow for the storage
67+
of out-of-memory kernels.
68+
69+
Recent updates to mpi4py (version 3.0 and above) [@Dalcin] have simplified its integration, enabling more efficient data
70+
communication between nodes and processes.
71+
Some projects in the Python ecosystem, such as mpi4py-fft [@Mortensen2019], mcdc [@Morgan2024], and mpi4jax [@mpi4jax],
72+
utilize MPI to extend its capabilities,
73+
improving the efficiency and scalability of distributed computing.
74+
75+
PyLops-MPI is built on top of PyLops[@Ravasi:2020] and utilizes mpi4py to enable an efficient framework to deal with
76+
large scale problems in a distributed and parallelized manner.
77+
PyLops-MPI offers an intuitive API that allows users to easily scatter and broadcast data and models across different
78+
nodes or processors, enabling matrix-vector and adjoint matrix-vector operations in a distributed manner. It provides a
79+
suite of MPI Linear Operators (MPI Powered Linear Operators) and MPI-powered inversion solvers, along with the
80+
flexibility to create custom solvers tailored to specific needs.
81+
82+
What sets PyLops-MPI apart from other libraries is its ease of use in creating MPI Operators, facilitating efficient
83+
integration between mpi4py and PyLops. This enables users to solve large-scale, complex inverse problems without the
84+
risk of data leaks or the need to manage MPI requirements themselves.
85+
86+
# Software Framework
87+
88+
PyLops-MPI introduces MPI support to PyLops by providing an efficient API for solving linear problems through
89+
parallelization using the mpi4py library. This library is designed to tackle large-scale inverse linear problems that
90+
are difficult to solve using a single process.
91+
92+
The main components of the library include:
93+
94+
## DistributedArray
95+
96+
The `pylops_mpi.DistributedArray` class serves as the fundamental array class used throughout the library. It enables
97+
the
98+
partitioning of large NumPy[@harris2020array] or CuPy[@cupy_learningsys2017] arrays into smaller local arrays, which can
99+
be distributed across different ranks.
100+
Additionally, it allows for broadcasting the NumPy or CuPy array to multiple processes.
101+
102+
The DistributedArray supports two types of partitions through the **partition** attribute: `Partition.SCATTER`
103+
distributes
104+
the data across all ranks, allowing users to specify how much load each rank should handle, while `Partition.BROADCAST`
105+
creates a copy of the data and distributes it to all ranks, ensuring that the data is available on each rank.
106+
107+
Furthermore, various basic mathematical functions are implemented for operations using the DistributedArray:
108+
109+
- Add (+) / Subtract (-): Adds or subtracts two DistributedArrays.
110+
- Multiply (*): Multiplies two DistributedArrays.
111+
- Dot-product (@): Calculates the dot product by flattening the arrays, resulting in a scalar value.
112+
- Conj: Computes the conjugate of the DistributedArray.
113+
- Norms: Calculates the vector norm along any specified axis.
114+
- Copy: Creates a deep copy of the DistributedArray.
115+
116+
## MPILinearOperators
117+
118+
`pylops_mpi.MPILinearOperator` is the base class for all MPI linear operators, allowing users to create new operators
119+
for matrix-vector products that can solve various inverse problems. To create a new MPILinearOperator, users need to
120+
subclass the `pylops_mpi.MPILinearOperator` parent class and specify the **shape** and **dtype**. The **_matvec** method
121+
should be implemented for the forward operator, and the **_rmatvec** method should be used for the Hermitian adjoint.
122+
123+
## MPI Powered Solvers
124+
125+
PyLops-MPI offers a range of MPI-powered solvers that tackle linear problems using a standard least-squares cost
126+
function. These solvers leverage **DistributedArray** and **MPILinearOperators** to perform inversion calculations. Our
127+
solvers can be found within the submodule `pylops_mpi.optimization`.
128+
129+
# Use Cases
130+
131+
# References

0 commit comments

Comments
 (0)