- 
                Notifications
    You must be signed in to change notification settings 
- Fork 120
GSoC 2023 Project Ideas
Thank you for your interest in applying for Google Summer of Code with CuPy!
PyLops is a Python library for large-scale, matrix-free optimization built on top of NumPy and SciPy (as well as other smaller libraries in the Python ecosystem: see this Overview for more details). Currently, a GPU backend, built on top of CuPy, is also offered that completely mirrors the original CPU backend: from a user perspective, the input arrays provided to operators and solvers drive the internal working of each operator/solver (i.e., if a NumPy array is passed the entire computation is run on CPU, if a CuPy array is passed the entire computation is run on GPU).
Contributing to PyLops will help everyone in the Python ecosystem to run large-scale optimization problems seamlessly as if they were running their dense matrix counterpart using purely NumPy and SciPy routines.
This year, we are planning to accept 1 student in total and below we provide 3 possible projects we would like them to work on.
If you are new to PyLops and would like to become familiar with our codebase, take a look at some of our tutorials:
- Transform 2022: Youtube video links.
- Transform 2021: Youtube video links.
- Swung Rendezvous 2021: Youtube video links.
- PyDataGlobal 2020: Youtube video links.
It is also a good idea to browse through our main documentation and its tutorials and gallery.
Notes
On the student registration form, please prefix your title with PyLops: and choose N/A for "NumFOCUS proposal tag".
The curvelet transform is a multidimensional wavelet-type transform with many nice properties, such as edge-preservation, exact inverse and, for many common types of signals, parsimony of coefficients and optimal representation. It is used in a variety of tasks such as denoising, adaptive filtering, preconditioning other transforms, sparsification and compression, etc. With the rise of deep learning, the curvelet transform also has the potential to be a powerful feature extractor, such as has already been shown for other, less powerful wavelet-type transforms. The Curvelops library developed the PyLops team has some cool visual examples.
There are at least three types of digital implementations of the curvelet transform: (a) the FDCT (Fast Digital Curvelet Transform) via wrapping [1], (b) the FDCT via unequally spaced fast Fourier transform (USFFT) [1], and (c) the Uniform Discrete Curvelet Transform (UDCT) [2]. Many flavors of the FDCT are available from the CurveLab but its licensing hinders open-source implementations, as they could be construed as derived code. For example, the Curvelops library requires the user to license the CurveLab software before being able to use its functions, as it depends on the CurveLab source code. So while the wrapper which only provides an API is MIT-licensed, it requires proprietary code for performing the transforms.
This project proposes the implementation of the UDCT which, in addition to not having the same licensing hurdles, also solves some technical issues identified in the FDCT. It is an excellent opportunity for those wanting to learn how to code a Python library from scratch, as well as learn more about signal and image processing.
[1] Candès, E., Demanet, L., Donoho, D., & Ying, L. (2006). Fast Discrete Curvelet Transforms. Multiscale Modeling & Simulation, 5(3), 861–899.
[2] Nguyen, T. T., & Chauris, H. (2010). Uniform Discrete Curvelet Transform. IEEE Transactions on Signal Processing, 58(7), 3618–3634.
- A Python 3 package containing:
- A NumPy and/or PyTorch and/or CuPy-based Python implementation of the forward and inverse UDCT
- A robust test suite that ensures correctness of the forward and inverse UDCT.
 
- Examples of how to use the UDCT and integrate with libraries such as PyLops and PyTorch.
- Qualitative comparison between the resulting UDCT implementation and the FDCT (via Curvelops).
- Python 3
- NumPy and/or PyTorch and/or CuPy. The NumPy implementation can be developed first, and later ported to CuPy (Easy) and PyTorch (Medium).
- A Python testing suite such as pytestorunittest. Mentors can help those eager to learn.
- Some image and/or signal processing experience preferred, but not required!
- Some experience reading/running Julia code may be helpful as an MIT-licensed Julia implementation of the UDCT exists: Curvelet.jl.
- No experience with any wavelet or curvelet transform is required, mentors will provide full support for the technical aspects.
@cako
@mrava87
Medium (175 hours)
Medium
PyLops has been developed from the ground up to solve large-scale linear inverse problems with scalability in mind. By leveraging matrix-free operators and iterative solvers, problems with model size ranging from hundreds to millions of parameters can be solved with the same codebase and user experience.
Nevertheless, with the continuous growth of scientific datasets and the desire for higher resolution, two common scenarios arise that may require the use of distributed computing alongside matrix-free linear algebra:
- The size of the model and/or data we want to invert for exceeds the memory size of a single machine used to perform our computations;
- The operator we wish to invert for is composed of many (possibly very expensive) computational blocks that can be easily parallelized.
In this project we aim to extend PyLops to handle such scenarios with the help of MPI, and more specifically the MPI4Py library. When dealing with distributed inverse problems we identify three distinct use-cases:
- 
Both model and data are fully distributed across nodes. There is no communication or minimal communication in the modelling operator. Communication happens in the solver when dot-products of model and data vectors are applied, and in regularization operators (e.g., Laplacian). Example: Post-stack seismic inversion, where each node has a portion of the model and data and the only communication happens if we want to add a spatial regularizer on the model; 
- 
Data are distributed across nodes but model is available at all nodes. No communication happens in the forward pass, communication happens in adjoint pass where models produced by each node must be summed. Communications also happens in the solver when dot products of data are required. Example: CT/MRI imaging, or seismic least-squares migration. 
- 
Both model and data are available in all nodes (or just in the master). There is communication only happening withing the operator with master sending some part of the model/data to workers and workers performing some computations. No communication is required in the solver. Example: MDC-based inversions (allows storing out-of-memory kernels). Note that there is a rather mature codebase that could function as inspiration for this project. However, whilst this library is very specific to one problem, our project will aim to be more generic and fitting a wider variety of problems. 
- A Python 3 package (which will be later integrated in the PyLops codebase) containing:
- A DistributedArraycontainer for distributed NumPy/CuPy arrays and methods performing basic mathematical operations (e.g., sum, element-wise product, dot product) in a distributed fashion;
- Three operators MPIHStack,MPIVStack,MPIBlockDiagwrapping basic PyLops operators and instructing PyLops about which type of 'MPI-reduction' to perform over multiple processes (e.g.,MPIHStackapplies sum-reduction in forward mode,MPIVStackapplies sum-reduction in forward mode).
- One (or more) of the PyLops solvers converted to work with DistributedArrayarrays.
- 
FirstDerivativeandSecondDerivativeoperators that apply finite-difference stencils to distributed arrays (i.e., partially communicating edge values across processes). These methods are of vital importance as lie at the basis of many commonly used regularizations (e.g., Tikhonov, TV).
 
- A 
- A test suite that ensures correctness of the methods implemented in the project. CI with GitHub Actions using https://github.com/mpi4py/setup-mpi.
- One or more examples showcasing the new distributed features on a real life problems from the PyLops library.
- Python 3
- Good knowledge of NumPy and basic knowledge of PyLops and its inner working.
- Basic knowledge of MPI programming. Some experience with MPI4Py is preferred, but not required!
- A Python testing suite such as pytestorunittest. Mentors can help those eager to learn.
The participant must have access to a machine with MPI installation available
@mrava87
@hongyx11
@cako
Medium (175 hours)
Medium
Since 2021, PyLops has a dual, fully interchangeable backend supporting operations with NumPy (CPU) and CuPy (GPU). This allows users to develop their codes on laptops and easily switch to workstations or cloud instances equipped with a GPU with minimal modification to their codes (i.e., simply turning all NumPy arrays into CuPy arrays).
In a similar fashion to what described in Project 2, this projects aims to extend PyLops’ GPU capability to multi-GPU scenarios. More specifically, whilst one could simply use an MPI-based solution to distribute computations to multiple GPUs physically co-located or placed on multiple machines, in this project we aim to develop codes targeting the scenario where data can be directly exchanged between physically co-located GPUs without incurring in any extra cost arising from device-to-host and host-to-device data movement.
CuPy currently provides all the building block required for such an endeavor: more specifically, cupy.cuda.Device allows user to move data between host and devices and control where computations are performed (see also https://docs.cupy.dev/en/stable/user_guide/basic.html#data-transfer). Similarly, Streams can be used to ensure maximum usage of all devices at any time.
Finally, whilst PyLops’ users will benefit from the outcome of this project as standalone, this may later be also integrated with the outcome of Project 2 to allow a double level of parallelism (i.e, multiple machines with multiple GPUs).
- A set of new features added directly into the PyLops main library including:
- A MultiGPUArraycontainer for multi-gpu CuPy arrays and methods performing basic mathematical operations (e.g., sum, element-wise product, dot product) in a distributed fashion;
- Similar outcomes to points 2, 3, and 4 in Project 2.
 
- A 
- A test suite that ensures correctness of the methods implemented in the project.
- One or more examples showcasing the new multi-gpu features on a real life problems from the PyLops library.
- Python 3
- Good knowledge of NumPy and CuPy. Basic knowledge of PyLops and its inner working is also preferred but not required.
- Good understanding of GPU programming.
- A Python testing suite such as pytestorunittest. Mentors can help those eager to learn.
The participant must have access to a machine with at least 2 GPUs available
@mrava87
@hongyx11
@cako
Medium (175 hours)
Medium