|
| 1 | +--- |
| 2 | +title: 'PyLops-MPI - MPI Powered PyLops with mpi4py' |
| 3 | +tags: |
| 4 | + - Python |
| 5 | + - MPI |
| 6 | + - High Performance Computing |
| 7 | +authors: |
| 8 | + - name: Rohan Babbar |
| 9 | + orcid: 0000-0002-7203-7641 |
| 10 | + affiliation: 1 |
| 11 | + - name: Matteo Ravasi |
| 12 | + orcid: 0000-0003-0020-2721 |
| 13 | + affiliation: 2 |
| 14 | + - name: Yuxi Hong |
| 15 | + orcid: 0000-0002-0741-6602 |
| 16 | + affiliation: 3 |
| 17 | +affiliations: |
| 18 | + - name: Computer Science and Engineering, Cluster Innovation Center, University of Delhi, Delhi, India. |
| 19 | + index: 1 |
| 20 | + - name: Earth Science and Engineering, Physical Sciences and Engineering (PSE), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia. |
| 21 | + index: 2 |
| 22 | + - name: Postdoc Researcher (Computer Science), Lawrence Berkeley National Laboratory, Berkeley, California, United States of America. |
| 23 | + index: 3 |
| 24 | +date: 24 September 2024 |
| 25 | +bibliography: paper.bib |
| 26 | +--- |
| 27 | + |
| 28 | +# Summary |
| 29 | + |
| 30 | +Large-scale linear operations and inverse problems are fundamental to numerous algorithms in fields such as image |
| 31 | +processing, geophysics, signal processing, and remote sensing. This paper presents PyLops-MPI, an extension of PyLops |
| 32 | +designed for distributed and parallel processing of large-scale challenges. PyLops-MPI facilitates forward and adjoint |
| 33 | +matrix-vector products, as well as inversion solvers, in a distributed framework. By using the Message Passing |
| 34 | +Interface (MPI), this framework effectively utilizes the computational power of multiple nodes or processors, enabling |
| 35 | +efficient solutions to large and complex inversion tasks in a parallelized manner. |
| 36 | + |
| 37 | +# Statement of need |
| 38 | + |
| 39 | +As scientific datasets grow and the demand for higher resolution increases, the need for distributed computing alongside |
| 40 | +matrix-free linear algebra becomes more critical. The size of models and datasets often exceeds the memory capacity of a |
| 41 | +single machine—making it difficult to perform computations efficiently and accurately. Many operators consist of |
| 42 | +multiple computational blocks that are resource-intensive—that can be effectively parallelized, further emphasizing the |
| 43 | +necessity for a distributed approach. |
| 44 | + |
| 45 | +When addressing distributed inverse problems, we identify three distinct use cases that highlight the need for a |
| 46 | +flexible, scalable framework: |
| 47 | + |
| 48 | +- **Fully Distributed Models and Data**: Both the model and data are distributed across nodes, with minimal |
| 49 | + communication during the modeling process. Communication |
| 50 | + occurs mainly during the solver stage when dot products or regularization, such as the Laplacian, are applied. This |
| 51 | + scenario is common |
| 52 | + in [Post-Stack seismic inversion](https://pylops.readthedocs.io/en/stable/tutorials/poststack.html#sphx-glr-tutorials-poststack-py), |
| 53 | + where each node handles a portion of the model and data, and communication only happens when adding spatial |
| 54 | + regularizers. |
| 55 | + |
| 56 | +- **Distributed Data, Model Available on All Nodes**: In this case, data is distributed across nodes while the model is |
| 57 | + available at all nodes. Communication is required |
| 58 | + during the adjoint pass when models produced by each node need to be summed, and in the solver when performing dot |
| 59 | + products on the data. This pattern is typical in fields |
| 60 | + like [CT/MRI imaging](https://pylops.readthedocs.io/en/stable/tutorials/ctscan.html#sphx-glr-tutorials-ctscan-py) |
| 61 | + and [seismic least-squares migration](https://pylops.readthedocs.io/en/stable/tutorials/lsm.html#sphx-glr-tutorials-lsm-py). |
| 62 | + |
| 63 | +- **Model and Data Available on All Nodes or Master**: Here, communication is confined to the operator, with the master |
| 64 | + node distributing parts of the model or data to |
| 65 | + workers. The workers then perform computations without requiring communication in the solver. An example of this is |
| 66 | + [MDC-based inversions](https://github.com/DIG-Kaust/TLR-MDC), which allow for the storage |
| 67 | + of out-of-memory kernels. |
| 68 | + |
| 69 | +Recent updates to mpi4py (version 3.0 and above) [@Dalcin] have simplified its integration, enabling more efficient data |
| 70 | +communication between nodes and processes. |
| 71 | +Some projects in the Python ecosystem, such as mpi4py-fft [@Mortensen2019], mcdc [@Morgan2024], and mpi4jax [@mpi4jax], |
| 72 | +utilize MPI to extend its capabilities, |
| 73 | +improving the efficiency and scalability of distributed computing. |
| 74 | + |
| 75 | +PyLops-MPI is built on top of PyLops[@Ravasi:2020] and utilizes mpi4py to enable an efficient framework to deal with |
| 76 | +large scale problems in a distributed and parallelized manner. |
| 77 | +PyLops-MPI offers an intuitive API that allows users to easily scatter and broadcast data and models across different |
| 78 | +nodes or processors, enabling matrix-vector and adjoint matrix-vector operations in a distributed manner. It provides a |
| 79 | +suite of MPI Linear Operators (MPI Powered Linear Operators) and MPI-powered inversion solvers, along with the |
| 80 | +flexibility to create custom solvers tailored to specific needs. |
| 81 | + |
| 82 | +What sets PyLops-MPI apart from other libraries is its ease of use in creating MPI Operators, facilitating efficient |
| 83 | +integration between mpi4py and PyLops. This enables users to solve large-scale, complex inverse problems without the |
| 84 | +risk of data leaks or the need to manage MPI requirements themselves. |
| 85 | + |
| 86 | +# Software Framework |
| 87 | + |
| 88 | +PyLops-MPI introduces MPI support to PyLops by providing an efficient API for solving linear problems through |
| 89 | +parallelization using the mpi4py library. This library is designed to tackle large-scale inverse linear problems that |
| 90 | +are difficult to solve using a single process. |
| 91 | + |
| 92 | +The main components of the library include: |
| 93 | + |
| 94 | +## DistributedArray |
| 95 | + |
| 96 | +The `pylops_mpi.DistributedArray` class serves as the fundamental array class used throughout the library. It enables |
| 97 | +the |
| 98 | +partitioning of large NumPy[@harris2020array] or CuPy[@cupy_learningsys2017] arrays into smaller local arrays, which can |
| 99 | +be distributed across different ranks. |
| 100 | +Additionally, it allows for broadcasting the NumPy or CuPy array to multiple processes. |
| 101 | + |
| 102 | +The DistributedArray supports two types of partitions through the **partition** attribute: `Partition.SCATTER` |
| 103 | +distributes |
| 104 | +the data across all ranks, allowing users to specify how much load each rank should handle, while `Partition.BROADCAST` |
| 105 | +creates a copy of the data and distributes it to all ranks, ensuring that the data is available on each rank. |
| 106 | + |
| 107 | +Furthermore, various basic mathematical functions are implemented for operations using the DistributedArray: |
| 108 | + |
| 109 | +- Add (+) / Subtract (-): Adds or subtracts two DistributedArrays. |
| 110 | +- Multiply (*): Multiplies two DistributedArrays. |
| 111 | +- Dot-product (@): Calculates the dot product by flattening the arrays, resulting in a scalar value. |
| 112 | +- Conj: Computes the conjugate of the DistributedArray. |
| 113 | +- Norms: Calculates the vector norm along any specified axis. |
| 114 | +- Copy: Creates a deep copy of the DistributedArray. |
| 115 | + |
| 116 | +## MPILinearOperators |
| 117 | + |
| 118 | +`pylops_mpi.MPILinearOperator` is the base class for all MPI linear operators, allowing users to create new operators |
| 119 | +for matrix-vector products that can solve various inverse problems. To create a new MPILinearOperator, users need to |
| 120 | +subclass the `pylops_mpi.MPILinearOperator` parent class and specify the **shape** and **dtype**. The **_matvec** method |
| 121 | +should be implemented for the forward operator, and the **_rmatvec** method should be used for the Hermitian adjoint. |
| 122 | + |
| 123 | +## MPI Powered Solvers |
| 124 | + |
| 125 | +PyLops-MPI offers a range of MPI-powered solvers that tackle linear problems using a standard least-squares cost |
| 126 | +function. These solvers leverage **DistributedArray** and **MPILinearOperators** to perform inversion calculations. Our |
| 127 | +solvers can be found within the submodule `pylops_mpi.optimization`. |
| 128 | + |
| 129 | +# Use Cases |
| 130 | + |
| 131 | +# References |
0 commit comments