Skip to content

Commit e895a2b

Browse files
committed
Merge remote-tracking branch 'origin/main' into halo_op
2 parents dcf79de + fb4003b commit e895a2b

File tree

21 files changed

+920
-27
lines changed

21 files changed

+920
-27
lines changed

.github/workflows/build.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616
os: [ubuntu-latest, macos-latest]
1717
python-version: ['3.10', '3.11', '3.12', '3.13']
1818
mpi: ['mpich', 'openmpi', 'intelmpi']
19-
rank: ['2', '3', '4']
19+
rank: ['2', '4', '9']
2020
exclude:
2121
- os: macos-latest
2222
mpi: 'intelmpi'
@@ -43,4 +43,9 @@ jobs:
4343
- name: Install pylops-mpi
4444
run: pip install .
4545
- name: Testing using pytest-mpi
46-
run: mpiexec -n ${{ matrix.rank }} pytest tests/ --with-mpi
46+
run: |
47+
if [ "${{ matrix.mpi }}" = "openmpi" ]; then
48+
mpiexec --mca btl ^openib -n ${{ matrix.rank }} pytest tests/ --with-mpi
49+
else
50+
mpiexec -n ${{ matrix.rank }} pytest tests/ --with-mpi
51+
fi

CHANGELOG.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,19 @@
1+
2+
# 0.3.0
3+
* Added `pylops_mpi.basicoperators.MPIMatrixMult` operator.
4+
* Added NCCL support to all operators in :mod:`pylops_mpi.basicoperators`,
5+
and `pylops_mpi.signalprocessing`.
6+
* Added ``base_comm_nccl`` in constructor of `pylops_mpi.DistributedArray`,
7+
to enable NCCL communication backend.
8+
* Added `pylops_mpi.utils.benchmark` subpackage providing methods
9+
to decorate and mark functions / class methods to measure their execution
10+
time.
11+
* Added `pylops_mpi.utils._nccl` subpackage implementing methods
12+
for NCCL communication backend.
13+
* Added `pylops_mpi.utils.deps` subpackage to safely import ``nccl``
14+
* Fixed partition in the creation of the output distributed array in
15+
`pylops_mpi.signalprocessing.MPIFredholm1`.
16+
117
# 0.2.0
218
- Added support for using CuPy arrays with PyLops-MPI.
319
- Introduced the `pylops_mpi.signalprocessing.MPIFredholm1` and `pylops_mpi.waveeqprocessing.MPIMDC` operators.

Makefile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
PIP := $(shell command -v pip3 2> /dev/null || command which pip 2> /dev/null)
22
PYTHON := $(shell command -v python3 2> /dev/null || command which python 2> /dev/null)
3-
NUM_PROCESSES = 3
3+
NUM_PROCESSES = 4
44

55
.PHONY: install dev-install dev-install_nccl install_ \
66
conda install_conda_nccl dev-install_conda dev-install_conda_nccl \
@@ -53,10 +53,11 @@ tests:
5353
tests_nccl:
5454
mpiexec -n $(NUM_PROCESSES) pytest tests_nccl/ --with-mpi
5555

56+
# sphinx-build does not work well with NCCL
5657
doc:
5758
cd docs && rm -rf source/api/generated && rm -rf source/gallery &&\
5859
rm -rf source/tutorials && rm -rf build &&\
59-
cd .. && sphinx-build -b html docs/source docs/build
60+
cd .. && NCCL_PYLOPS_MPI=0 sphinx-build -b html docs/source docs/build
6061

6162
doc_cupy:
6263
cp tutorials_cupy/* tutorials/

docs/source/api/index.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,9 +118,11 @@ Utils
118118
local_split
119119

120120

121-
.. currentmodule:: pylops_mpi.utils.dottest
121+
.. currentmodule:: pylops_mpi.utils
122122

123123
.. autosummary::
124124
:toctree: generated/
125125

126-
dottest
126+
dottest
127+
benchmark
128+
mark

docs/source/benchmarking.rst

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
.. _benchmarkutility:
2+
3+
Benchmarking
4+
============
5+
6+
PyLops-MPI users can convenienly benchmark the performance of their code with a simple decorator.
7+
:py:func:`pylops_mpi.utils.benchmark` and :py:func:`pylops_mpi.utils.mark` support various
8+
function calling patterns that may arise when benchmarking distributed code.
9+
10+
- :py:func:`pylops_mpi.utils.benchmark` is a **decorator** used to time the execution of entire functions.
11+
- :py:func:`pylops_mpi.utils.mark` is a **function** used inside decorated functions to insert fine-grained time measurements.
12+
13+
.. note::
14+
This benchmark utility is enabled by default i.e., if the user decorates the function with :py:func:`@benchmark`, the function will go through
15+
the time measurements, adding overheads. Users can turn off the benchmark while leaving the decorator in-place with
16+
17+
.. code-block:: bash
18+
19+
>> export BENCH_PYLOPS_MPI=0
20+
21+
The usage can be as simple as:
22+
23+
.. code-block:: python
24+
25+
@benchmark
26+
def function_to_time():
27+
# Your computation
28+
29+
The result will print out to the standard output.
30+
For fine-grained time measurements, :py:func:`pylops_mpi.utils.mark` can be inserted in the code region of benchmarked functions:
31+
32+
.. code-block:: python
33+
34+
@benchmark
35+
def funtion_to_time():
36+
# You computation that you may want to ignore it in benchmark
37+
mark("Begin Region")
38+
# You computation
39+
mark("Finish Region")
40+
41+
You can also nest benchmarked functions to track execution times across layers of function calls with the output being correctly formatted.
42+
Additionally, the result can also be exported to the text file. For completed and runnable examples, visit :ref:`sphx_glr_tutorials_benchmarking.py`

docs/source/changelog.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,27 @@
33
Changelog
44
=========
55

6+
7+
Version 0.3.0
8+
-------------
9+
10+
*Released on: 05/08/2025*
11+
12+
* Added :class:`pylops_mpi.basicoperators.MPIMatrixMult` operator.
13+
* Added NCCL support to all operators in :mod:`pylops_mpi.basicoperators`,
14+
and :mod:`pylops_mpi.signalprocessing`.
15+
* Added ``base_comm_nccl`` in constructor of :class:`pylops_mpi.DistributedArray`,
16+
to enable NCCL communication backend.
17+
* Added :class:`pylops_mpi.utils.benchmark` subpackage providing methods
18+
to decorate and mark functions / class methods to measure their execution
19+
time.
20+
* Added :class:`pylops_mpi.utils._nccl` subpackage implementing methods
21+
for NCCL communication backend.
22+
* Added :class:`pylops_mpi.utils.deps` subpackage to safely import ``nccl``
23+
* Fixed partition in the creation of the output distributed array in
24+
:class:`pylops_mpi.signalprocessing.MPIFredholm1`.
25+
26+
627
Version 0.2.0
728
-------------
829

@@ -14,6 +35,7 @@ Version 0.2.0
1435
* Added a dottest function to perform dot tests on PyLops-MPI operators.
1536
* Created a tutorial for Multi-Dimensional Deconvolution (MDD).
1637

38+
1739
Version 0.1.0
1840
-------------
1941

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ class and implementing the ``_matvec`` and ``_rmatvec``.
7676
self
7777
installation.rst
7878
gpu.rst
79+
benchmarking.rst
7980

8081
.. toctree::
8182
:maxdepth: 2

pylops_mpi/DistributedArray.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -452,7 +452,7 @@ def _check_local_shapes(self, local_shapes):
452452
elif self.partition is Partition.SCATTER:
453453
local_shape = local_shapes[self.rank]
454454
# Check if local shape sum up to global shape and other dimensions align with global shape
455-
if self._allreduce(local_shape[self.axis]) != self.global_shape[self.axis] or \
455+
if self.base_comm.allreduce(local_shape[self.axis]) != self.global_shape[self.axis] or \
456456
not np.array_equal(np.delete(local_shape, self.axis), np.delete(self.global_shape, self.axis)):
457457
raise ValueError(f"Local shapes don't align with the global shape;"
458458
f"{local_shapes} != {self.global_shape}")

pylops_mpi/basicoperators/Laplacian.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
from typing import Tuple
22
import numpy as np
3-
from numpy.core.multiarray import normalize_axis_index
43
from mpi4py import MPI
54

65
from pylops.utils.typing import DTypeLike, InputDimsLike
76
from pylops.basicoperators import SecondDerivative
7+
from pylops.utils.backend import get_normalize_axis_index
88

99
from pylops_mpi import DistributedArray, MPILinearOperator, Partition
1010
from pylops_mpi.DistributedArray import local_split
@@ -75,7 +75,7 @@ def __init__(self, dims: InputDimsLike,
7575
base_comm: MPI.Comm = MPI.COMM_WORLD,
7676
dtype: DTypeLike = np.float64):
7777
self.dims = dims
78-
axes = tuple(normalize_axis_index(ax, len(dims)) for ax in axes)
78+
axes = tuple(get_normalize_axis_index()(ax, len(dims)) for ax in axes)
7979
if not (len(axes) == len(weights) == len(sampling)):
8080
raise ValueError("axes, weights, and sampling have different size")
8181
self.axes = axes

pylops_mpi/utils/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# isort: skip_file
22

3+
from .benchmark import *
34
from .dottest import *
45
from .deps import *

0 commit comments

Comments
 (0)