Add paper.md

rohanbabbar04 · web-flow · commit b331dc994f97 · 2024-09-24T23:26:12.000+05:30
diff --git a/joss/paper.md b/joss/paper.md
@@ -0,0 +1,131 @@
+---
+title: 'PyLops-MPI - MPI Powered PyLops with mpi4py'
+tags:
+  - Python
+  - MPI
+  - High Performance Computing
+authors:
+  - name: Rohan Babbar
+    orcid: 0000-0002-7203-7641
+    affiliation: 1
+  - name: Matteo Ravasi
+    orcid: 0000-0003-0020-2721
+    affiliation: 2
+  - name: Yuxi Hong
+    orcid: 0000-0002-0741-6602
+    affiliation: 3
+affiliations:
+  - name: Computer Science and Engineering, Cluster Innovation Center, University of Delhi, Delhi, India.
+    index: 1
+  - name: Earth Science and Engineering, Physical Sciences and Engineering (PSE), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
+    index: 2
+  - name: Postdoc Researcher (Computer Science), Lawrence Berkeley National Laboratory, Berkeley, California, United States of America.
+    index: 3
+date: 24 September 2024
+bibliography: paper.bib
+---
+
+# Summary
+
+Large-scale linear operations and inverse problems are fundamental to numerous algorithms in fields such as image
+processing, geophysics, signal processing, and remote sensing. This paper presents PyLops-MPI, an extension of PyLops
+designed for distributed and parallel processing of large-scale challenges. PyLops-MPI facilitates forward and adjoint
+matrix-vector products, as well as inversion solvers, in a distributed framework. By using the Message Passing
+Interface (MPI), this framework effectively utilizes the computational power of multiple nodes or processors, enabling
+efficient solutions to large and complex inversion tasks in a parallelized manner.
+
+# Statement of need
+
+As scientific datasets grow and the demand for higher resolution increases, the need for distributed computing alongside
+matrix-free linear algebra becomes more critical. The size of models and datasets often exceeds the memory capacity of a
+single machine—making it difficult to perform computations efficiently and accurately. Many operators consist of
+multiple computational blocks that are resource-intensive—that can be effectively parallelized, further emphasizing the
+necessity for a distributed approach.
+
+When addressing distributed inverse problems, we identify three distinct use cases that highlight the need for a
+flexible, scalable framework:
+
+- **Fully Distributed Models and Data**: Both the model and data are distributed across nodes, with minimal
+  communication during the modeling process. Communication
+  occurs mainly during the solver stage when dot products or regularization, such as the Laplacian, are applied. This
+  scenario is common
+  in [Post-Stack seismic inversion](https://pylops.readthedocs.io/en/stable/tutorials/poststack.html#sphx-glr-tutorials-poststack-py),
+  where each node handles a portion of the model and data, and communication only happens when adding spatial
+  regularizers.
+
+- **Distributed Data, Model Available on All Nodes**: In this case, data is distributed across nodes while the model is
+  available at all nodes. Communication is required
+  during the adjoint pass when models produced by each node need to be summed, and in the solver when performing dot
+  products on the data. This pattern is typical in fields
+  like [CT/MRI imaging](https://pylops.readthedocs.io/en/stable/tutorials/ctscan.html#sphx-glr-tutorials-ctscan-py)
+  and [seismic least-squares migration](https://pylops.readthedocs.io/en/stable/tutorials/lsm.html#sphx-glr-tutorials-lsm-py).
+
+- **Model and Data Available on All Nodes or Master**: Here, communication is confined to the operator, with the master
+  node distributing parts of the model or data to
+  workers. The workers then perform computations without requiring communication in the solver. An example of this is
+  [MDC-based inversions](https://github.com/DIG-Kaust/TLR-MDC), which allow for the storage
+  of out-of-memory kernels.
+
+Recent updates to mpi4py (version 3.0 and above) [@Dalcin] have simplified its integration, enabling more efficient data
+communication between nodes and processes.
+Some projects in the Python ecosystem, such as mpi4py-fft [@Mortensen2019], mcdc [@Morgan2024], and mpi4jax [@mpi4jax],
+utilize MPI to extend its capabilities,
+improving the efficiency and scalability of distributed computing.
+
+PyLops-MPI is built on top of PyLops[@Ravasi:2020] and utilizes mpi4py to enable an efficient framework to deal with
+large scale problems in a distributed and parallelized manner.
+PyLops-MPI offers an intuitive API that allows users to easily scatter and broadcast data and models across different
+nodes or processors, enabling matrix-vector and adjoint matrix-vector operations in a distributed manner. It provides a
+suite of MPI Linear Operators (MPI Powered Linear Operators) and MPI-powered inversion solvers, along with the
+flexibility to create custom solvers tailored to specific needs.
+
+What sets PyLops-MPI apart from other libraries is its ease of use in creating MPI Operators, facilitating efficient
+integration between mpi4py and PyLops. This enables users to solve large-scale, complex inverse problems without the
+risk of data leaks or the need to manage MPI requirements themselves.
+
+# Software Framework
+
+PyLops-MPI introduces MPI support to PyLops by providing an efficient API for solving linear problems through
+parallelization using the mpi4py library. This library is designed to tackle large-scale inverse linear problems that
+are difficult to solve using a single process.
+
+The main components of the library include:
+
+## DistributedArray
+
+The `pylops_mpi.DistributedArray` class serves as the fundamental array class used throughout the library. It enables
+the
+partitioning of large NumPy[@harris2020array] or CuPy[@cupy_learningsys2017] arrays into smaller local arrays, which can
+be distributed across different ranks.
+Additionally, it allows for broadcasting the NumPy or CuPy array to multiple processes.
+
+The DistributedArray supports two types of partitions through the **partition** attribute: `Partition.SCATTER`
+distributes
+the data across all ranks, allowing users to specify how much load each rank should handle, while `Partition.BROADCAST`
+creates a copy of the data and distributes it to all ranks, ensuring that the data is available on each rank.
+
+Furthermore, various basic mathematical functions are implemented for operations using the DistributedArray:
+
+- Add (+) / Subtract (-): Adds or subtracts two DistributedArrays.
+- Multiply (*): Multiplies two DistributedArrays.
+- Dot-product (@): Calculates the dot product by flattening the arrays, resulting in a scalar value.
+- Conj: Computes the conjugate of the DistributedArray.
+- Norms: Calculates the vector norm along any specified axis.
+- Copy: Creates a deep copy of the DistributedArray.
+
+## MPILinearOperators
+
+`pylops_mpi.MPILinearOperator` is the base class for all MPI linear operators, allowing users to create new operators
+for matrix-vector products that can solve various inverse problems. To create a new MPILinearOperator, users need to
+subclass the `pylops_mpi.MPILinearOperator` parent class and specify the **shape** and **dtype**. The **_matvec** method
+should be implemented for the forward operator, and the **_rmatvec** method should be used for the Hermitian adjoint.
+
+## MPI Powered Solvers
+
+PyLops-MPI offers a range of MPI-powered solvers that tackle linear problems using a standard least-squares cost
+function. These solvers leverage **DistributedArray** and **MPILinearOperators** to perform inversion calculations. Our
+solvers can be found within the submodule `pylops_mpi.optimization`.
+
+# Use Cases
+
+# References