doc: reorganized README and some of the documentation pages

mrava87 · mrava87 · commit c6bc80b508f9 · 2025-06-13T21:29:21.000+01:00
diff --git a/README.md b/README.md
@@ -7,103 +7,118 @@
 [![Slack Status](https://img.shields.io/badge/chat-slack-green.svg)](https://pylops.slack.com)
 [![DOI](https://joss.theoj.org/papers/10.21105/joss.07512/status.svg)](https://doi.org/10.21105/joss.07512)
 
-## PyLops MPI
-pylops-mpi is a Python library built on top of [PyLops](https://pylops.readthedocs.io/en/stable/), designed to enable distributed and parallel processing of 
+# Distributed linear operators and solvers
+Pylops-mpi is a Python library built on top of [PyLops](https://pylops.readthedocs.io/en/stable/), designed to enable distributed and parallel processing of 
 large-scale linear algebra operations and computations.  
 
 ## Installation
-To install pylops-mpi, you need to have MPI (Message Passing Interface) installed on your system.
+To install pylops-mpi, you need to have Message Passing Interface (MPI) and optionally Nvidia's Collective Communication Library (NCCL) installed on your system.
+
 1. **Download and Install MPI**: Visit the official MPI website to download an appropriate MPI implementation for your system. 
 Follow the installation instructions provided by the MPI vendor.
    - [Open MPI](https://www.open-mpi.org/software/ompi/v1.10/)
    - [MPICH](https://www.mpich.org/downloads/)
    - [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.10j8fx)
+
 2. **Verify MPI Installation**: After installing MPI, verify its installation by opening a terminal or command prompt 
 and running the following command:
-    ```
-    mpiexec --version
    ```
- 3. **Install pylops-mpi**: Once MPI is installed and verified, you can proceed to install `pylops-mpi`. 
-   
-      You can install with `pip`:
-      ```
-      pip install pylops-mpi
-      ```
-   
-      You can install with `make` and `conda`:
-      ```
-      make install_conda
-      ```
-Optionally, if you work with multi-GPU environment and want to use Nvidia's collective communication calls (NCCL) enabled, install your environment with
+   mpiexec --version
+   ```
+
+3. **Install pylops-mpi**: Once MPI is installed and verified, you can proceed to install `pylops-mpi` via `pip`:
+   ```
+   pip install pylops-mpi
+   ```
+
+4. (Optional) To enable the NCCL backend for multi-GPU systems, install `cupy` and `nccl` via `pip`:
    ```
-   make install_conda_nccl 
+   pip install cupy-cudaXx nvidia-nccl-cuX
    ```
    
+   with `X=11,12`.
+
+Alternatively, if the Conda package manager is used to setup the Python environment, steps 1 and 2 can be skipped and install `mpi4py` which comes with its own MPI distribution:
+
+```
+conda install -c conda-forge mpi4py X
+```
+
+with `X=mpich, openmpi, impi_rt, msmpi`. Similarly step 4 can be accomplished using:
+
+```
+conda install -c conda-forge cupy nccl 
+```
+
+See the docs ([Installation](https://pylops.github.io/pylops-mpi/installation.html)) for more information.
+
 ## Run Pylops-MPI
 Once you have installed the prerequisites and pylops-mpi, you can run pylops-mpi using the `mpiexec` command. 
-Here's an example on how to run the command:
+
+Here is an example on how to run a python script called `<script_name>.py`:
 ```
 mpiexec -n <NUM_PROCESSES> python <script_name>.py
 ```
 
-## Example
-The DistributedArray can be used to either broadcast or scatter the NumPy array across different 
-ranks or processes.
+## Example: A distributed finite-difference operator
+The following example is a modified version of 
+[PyLops' README](https://github.com/PyLops/pylops/blob/dev/README.md)_ starting 
+example that can handle a 2D-array distributed across ranks over the first dimension 
+via the `DistributedArray` object:
+
 ```python
+import numpy as np
 from pylops_mpi import DistributedArray, Partition
 
-global_shape = (10, 5)
+nx, ny = 11, 21
+x = np.zeros((nx, ny), dtype=np.float64)
+x[nx // 2, ny // 2] = 1.0
 
-# Initialize a DistributedArray with partition set to Broadcast
-dist_array_broadcast = DistributedArray(global_shape=global_shape,
-                                        partition=Partition.BROADCAST)
+# Initialize  DistributedArray with partition set to Scatter
+x_dist = pylops_mpi.DistributedArray.to_dist(
+            x=x.flatten(), 
+            partition=Partition.SCATTER)
 
-# Initialize a DistributedArray with partition set to Scatter
-dist_array_scatter = DistributedArray(global_shape=global_shape,
-                                      partition=Partition.SCATTER)
-```
+# Distributed first-derivative
+D_op = pylops_mpi.MPIFirstDerivative((nx, ny), dtype=np.float64)
 
-Additionally, the DistributedArray can be used to scatter the array along any
-specified axis.
+# y = Dx
+y_dist = D_op @ x
 
-```python
-# Partition axis = 0
-dist_array_0 = DistributedArray(global_shape=global_shape, 
-                                partition=Partition.SCATTER, axis=0)
+# xadj = D^H y
+xadj_dist = D_op.H @ y_dist
 
-# Partition axis = 1
-dist_array_1 = DistributedArray(global_shape=global_shape, 
-                                partition=Partition.SCATTER, axis=1)
+# xinv = D^-1 y
+x0_dist = pylops_mpi.DistributedArray(D_op.shape[1], dtype=np.float64)
+x0_dist[:] = 0
+xinv_dist = pylops_mpi.cgls(D_op, y_dist, x0=x0_dist, niter=10)[0]
 ```
 
-The DistributedArray class provides a `to_dist` class method that accepts a NumPy array as input and converts it into an 
-instance of the `DistributedArray` class. This method is used to transform a regular NumPy array into a DistributedArray that can be distributed 
-and processed across multiple nodes or processes.
+Note that the `DistributedArray` class provides the `to_dist` class method that accepts a NumPy array as input and converts it into an instance of the `DistributedArray` class. This method is used to transform a regular NumPy array into a DistributedArray that is distributed and processed across multiple nodes or processes.
 
-```python
-import numpy as np
-np.random.seed(42)
-
-dist_arr = DistributedArray.to_dist(x=np.random.normal(100, 100, global_shape), 
-                                    partition=Partition.SCATTER, axis=0)
-```
-The DistributedArray also provides fundamental mathematical operations, like element-wise addition, subtraction, and multiplication, 
-as well as dot product and the [`np.linalg.norm`](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html) function in a distributed fashion, 
-thus utilizing the efficiency of the MPI protocol. This enables efficient computation and processing of large-scale distributed arrays.
+Moreover, the `DistributedArray` class provides also fundamental mathematical operations, such as element-wise addition, subtraction, multiplication, dot product, and an equivalent of the [`np.linalg.norm`](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html) function that operate in a distributed fashion, 
+thus utilizing the efficiency of the MPI/NCC; protocols. This enables efficient computation and processing of large-scale distributed arrays.
 
 ## Running Tests
-The test scripts are located in the tests folder.
+The MPI test scripts are located in the `tests` folder.
 Use the following command to run the tests:
 ```
-mpiexec -n <NUM_PROCESSES> pytest --with-mpi
+mpiexec -n <NUM_PROCESSES> pytest tests/ --with-mpi
+```
+where the `--with-mpi` option tells pytest to enable the `pytest-mpi` plugin, allowing the tests to utilize the MPI functionality.
+
+Similarly, to run the NCCL test scripts in the `tests_nccl` folder, 
+use the following command to run the tests:
+```
+mpiexec -n <NUM_PROCESSES> pytest tests_nccl/ --with-mpi
 ```
-The `--with-mpi` option tells pytest to enable the `pytest-mpi` plugin, 
-allowing the tests to utilize the MPI functionality.
 
 ## Documentation 
 The official documentation of Pylops-MPI is available [here](https://pylops.github.io/pylops-mpi/).
 Visit the official docs to learn more about pylops-mpi.
 
 ## Contributors
 * Rohan Babbar, rohanbabbar04
+* Yuxi Hong, hongyx11
 * Matteo Ravasi, mrava87
+* Tharit Tangkijwanichakul, tharittk
diff --git a/docs/source/credits.rst b/docs/source/credits.rst
@@ -4,6 +4,6 @@ Contributors
 ============
 
 *  `Rohan Babbar <https://github.com/rohanbabbar04>`_, rohanbabbar04
-*  `Matteo Ravasi <https://github.com/mrava87>`_, mrava87
 *  `Yuxi Hong <https://github.com/hongyx11>`_, hongyx11
-*  `Carlos da Costa <https://github.com/cako>`_, cako
+*  `Matteo Ravasi <https://github.com/mrava87>`_, mrava87
+*  `Tharit Tangkijwanichakul <https://github.com/tharittk>`_, tharittk
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -5,18 +5,26 @@ Installation
 
 Dependencies
 ************
-The PyLops-MPI project strives to create a library that is easy to install in
-any environment and has limited number of dependencies.
-Required dependencies are as follows:
+The minimal set of dependencies for the PyLops-MPI project is:
 
-* MPI(Message Passing Interface)
-* Python 3.8 or greater
+* MPI (Message Passing Interface)
+* Python 3.10 or greater
 * `NumPy <http://www.numpy.org>`_
 * `SciPy <http://www.scipy.org/scipylib/index.html>`_
 * `Matplotlib <https://matplotlib.org/>`_
 * `MPI4py <https://mpi4py.readthedocs.io/en/stable/>`_
 * `PyLops <https://pylops.readthedocs.io/en/stable/>`_
 
+Additionally, to use the NCCL engine, the following additional 
+dependencies are required:
+
+* `CuPy <https://cupy.dev/>`_
+* `NCCL <https://docs.cupy.dev/en/stable/install.html#additional-cuda-libraries>`_
+
+We highly encourage using the `Anaconda Python distribution <https://www.anaconda.com/download>`_
+or its standalone package manager `Conda <https://docs.conda.io/en/latest/index.html>`_. However,
+if this is not possible, some of the dependencies must be installed prior to installing PyLops-MPI.
+
 Download and Install MPI
 ========================
 Visit the official MPI website to download an appropriate MPI implementation for your system.
@@ -34,68 +42,66 @@ After installing MPI, verify its installation by opening a terminal and running
 
    >> mpiexec --version
 
-Fork PyLops-MPI
-===============
-Fork the `PyLops-MPI repository <https://github.com/PyLops/pylops-mpi>`_ and clone it by executing the following in your terminal:
+Install NCCL (optional)
+=======================
+To obtain highly-optimized performance on GPU clusters, PyLops-MPI also supports the Nvidia's collective communication calls
+`(NCCL) <https://developer.nvidia.com/nccl>`_. Two additional dependencies are required, CuPy and NCCL, which can be installed
+using `pip`:
 
 .. code-block:: bash
 
-   >> git clone https://github.com/YOUR-USERNAME/pylops-mpi.git
+   >> pip install cupy-cuda12x nvidia-nccl-cu12
 
-We recommend installing dependencies into a separate environment.
-For that end, we provide a `Makefile` with useful commands for setting up the environment.
+.. note::
 
-Enable Nvidia Collective Communication Library
-=======================================================
-To obtain highly-optimized performance on GPU clusters, PyLops-MPI also supports the Nvidia's collective communication calls
-`(NCCL) <https://developer.nvidia.com/nccl>`_. Two additional dependencies are required: CuPy and NCCL 
+   Replace `12x` with your CUDA version (e.g., `11x` for CUDA 11.x).
 
-* `CuPy with NCCL <https://docs.cupy.dev/en/stable/install.html>`_
 
+.. _UserInstall:
 
 Step-by-step installation for users
 ***********************************
 
-Conda
-=====
-For a ``conda`` environment, run
-
-.. code-block:: bash
-
-   >> make install_conda
-
-This will create and activate an environment called ``pylops_mpi``, with all required dependencies.
-
-Pip
-===
-If you prefer a ``pip`` installation, simply type the following command in your terminal to install the
-PyPI distribution:
+Currently PyLops-MPI can only be installed using ``pip``; simply type the following 
+command in your terminal to install the PyPI distribution:
 
 .. code-block:: bash
 
    >> pip install pylops-mpi
 
-When installing via pip, only required dependencies are installed.
-Note that, differently from the  ``conda`` command, the above **will not** create a virtual environment.
-Make sure you create and activate your environment previously.
+Note that when installing via `pip`, only *required* dependencies are installed.
+
 
 .. _DevInstall:
 
 Step-by-step installation for developers
 ****************************************
 
+Fork PyLops-MPI
+===============
+Fork the `PyLops-MPI repository <https://github.com/PyLops/pylops-mpi>`_ and clone it by executing the following in your terminal:
+
+.. code-block:: bash
+
+   >> git clone https://github.com/YOUR-USERNAME/pylops-mpi.git
+
+We recommend installing dependencies into a separate environment.
+For that end, we provide a `Makefile` with useful commands for setting up the environment.
+
 Install dependencies
 ====================
 
-Conda
------
+Conda (recommended)
+-------------------
+
 For a ``conda`` environment, run
 
 .. code-block:: bash
 
    >> make dev-install_conda
 
-This will create and activate an environment called ``pylops_mpi``, with all required and optional dependencies.
+This will create and activate an environment called ``pylops_mpi``, with all 
+required and optional dependencies.
 
 If you want to enable `NCCL <https://developer.nvidia.com/nccl>`_ in PyLops-MPI, run this instead
 
@@ -128,8 +134,8 @@ The `Makefile` is pre-configured with CUDA 12.x. If you use this version, run
    >> make dev-install_nccl
 
 Otherwise, you can change the command in `Makefile` to an appropriate CUDA version
-i.e., If you use CUDA 11.x, change ``cupy-cuda12x`` and ``nvidia-nccl-cu12`` to ``cupy-cuda11x`` and ``nvidia-nccl-cu11``  
-and run the command.
+i.e., If you use CUDA 11.x, change ``cupy-cuda12x`` and ``nvidia-nccl-cu12`` to 
+``cupy-cuda11x`` and ``nvidia-nccl-cu11`` and run the command.
 
 Run tests
 =========