Merge pull request #187 from mrava87/v0.4.0

mrava87 · web-flow · commit b6df40acb4a3 · 2026-03-06T20:21:35.000Z
build: prepare for v0.4.0
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,18 @@
+# 0.4.0
+* Added `pylops_mpi.Distributed.DistributedMixIn` class with
+  communicator-agnostic calls to communication methods.
+* Added `pylops_mpi.utils._mpi` with implementations of MPI
+  communication methods.
+* Added `kind="summa"` implementation in 
+  `pylops_mpi.basicoperators.MPIMatrixMult` operator.
+* Added `kind` paramter to all operators in `pylops_mpi.basicoperators.MPILaplacian`
+* Added `cp.cuda.Device().synchronize()` before any MPI call when using
+  Cuda-Aware MPI.
+* Modified `pylops_mpi.utils._nccl.initialize_nccl_comm` to 
+  handle nodes with more GPUs than ranks.
+* Fixed bug in `pylops_mpi.DistributedArray.__neg__` by
+  explicitely passing `base_comm_nccl` during internal creation 
+  of distributed array .
 
 # 0.3.0
 * Added `pylops_mpi.basicoperators.MPIMatrixMult` operator.
diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst
@@ -4,6 +4,27 @@ Changelog
 =========
 
 
+Version 0.4.0
+-------------
+
+*Released on: 08/03/2026*
+
+* Added :class:`pylops_mpi.Distributed.DistributedMixIn` class with
+  communicator-agnostic calls to communication methods.
+* Added :mod:`pylops_mpi.utils._mpi` with implementations of MPI
+  communication methods.
+* Added `kind="summa"` implementation in 
+  :class:`pylops_mpi.basicoperators.MPIMatrixMult` operator.
+* Added `kind` paramter to all operators in :class:`pylops_mpi.basicoperators.MPILaplacian`
+* Added `cp.cuda.Device().synchronize()` before any MPI call when using
+  Cuda-Aware MPI.
+* Modified :func:`pylops_mpi.utils._nccl.initialize_nccl_comm` to 
+  handle nodes with more GPUs than ranks
+* Fixed bug in :func:`pylops_mpi.DistributedArray.__neg__` by
+  explicitely passing `base_comm_nccl` during internal creation 
+  of distributed array 
+
+
 Version 0.3.0
 -------------
 
diff --git a/docs/source/gpu.rst b/docs/source/gpu.rst
@@ -24,11 +24,9 @@ cupy arrays.
 
 .. note::
 
-   By default when using ``cupy`` arrays, PyLops-MPI will try to use methods in MPI4Py that communicate memory buffers.
-   However, this requires a CUDA-Aware MPI installation. If your MPI installation is not CUDA-Aware, set the 
-   environment variable ``PYLOPS_MPI_CUDA_AWARE=0`` to force PyLops-MPI to use methods in  MPI4Py that communicate
-   general Python objects (this will incur a loss of performance!).
-
+   By default when using ``cupy`` arrays, PyLops-MPI will try to use methods in MPI4Py that communicate general 
+   Python objects (this will incur a loss of performance!). If you have a CUDA-Aware MPI installation, set
+   ``PYLOPS_MPI_CUDA_AWARE=1`` for PyLops-MPI to use methods in MPI4Py that communicate memory buffers.
 
 Moreover, PyLops-MPI also supports the Nvidia's Collective Communication Library (NCCL) for highly-optimized
 collective operations, such as AllReduce, AllGather, etc. This allows PyLops-MPI users to leverage the
@@ -53,11 +51,11 @@ In summary:
      - Default
      - Cannot be disabled
    * - CuPy + MPI
-     - ``PYLOPS_MPI_CUDA_AWARE=0``
-     - ``PYLOPS_MPI_CUDA_AWARE=1`` (default)
+     - ``PYLOPS_MPI_CUDA_AWARE=0`` (default)
+     - ``PYLOPS_MPI_CUDA_AWARE=1``
    * - CuPy + CUDA-Aware MPI
-     - ``PYLOPS_MPI_CUDA_AWARE=1`` (default)
-     - ``PYLOPS_MPI_CUDA_AWARE=0``
+     - ``PYLOPS_MPI_CUDA_AWARE=1``
+     - ``PYLOPS_MPI_CUDA_AWARE=0`` (default)
    * - CuPy + NCCL
      - ``NCCL_PYLOPS_MPI=1`` (default)
      - ``NCCL_PYLOPS_MPI=0``
diff --git a/pylops_mpi/utils/deps.py b/pylops_mpi/utils/deps.py
@@ -40,7 +40,7 @@ def nccl_import(message: Optional[str] = None) -> str:
 
 
 cuda_aware_mpi_enabled: bool = (
-    True if int(os.getenv("PYLOPS_MPI_CUDA_AWARE", 1)) == 1 else False
+    False if int(os.getenv("PYLOPS_MPI_CUDA_AWARE", 0)) == 0 else True
 )
 
 nccl_enabled: bool = (

Original file line number	Diff line number	Diff line change
`@@ -40,7 +40,7 @@ def nccl_import(message: Optional[str] = None) -> str:`
`40`	`40`
`41`	`41`
`42`	`42`	`cuda_aware_mpi_enabled: bool = (`
`43`		`- True if int(os.getenv("PYLOPS_MPI_CUDA_AWARE", 1)) == 1 else False`
	`43`	`+ False if int(os.getenv("PYLOPS_MPI_CUDA_AWARE", 0)) == 0 else True`
`44`	`44`	`)`
`45`	`45`
`46`	`46`	`nccl_enabled: bool = (`