Skip to content

Conversation

@mrava87
Copy link
Contributor

@mrava87 mrava87 commented Jun 16, 2025

This PR adds a new optional input to the DistributedArray.asarray method called masked, which can be used with distributed arrays that have a mask to create a global view of the distributed array that contains only elements from one color of the mask (so that repeated portions of the array are not returned).

@mrava87
Copy link
Contributor Author

mrava87 commented Jun 16, 2025

@astroC86, this is relevant to your PR as we can now compare the full output of your MPIMatrixMult with that of the serial MatrixMult.

@tharittk I have made some changes to the MPI tests, it would be good to do the same for the NCCL ones and check that the code I added works also with that backend (it should but I have not tested it yet)

Copy link
Collaborator

@rohanbabbar04 rohanbabbar04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!!

@tharittk
Copy link
Collaborator

I have tested it by copy-paste your update. This uncovers the bug in nccl_asarray() function when working with subcomm. I am not sure exactly where yet but it looks like current asarray() will call nccl_asarray() with self.local_shapes (which call all GPUs in the world communicator) and may mess with the subcomm.

The MPI does not have this problem because it sends the object and no buffer / array unrolling needed.

@mrava87
Copy link
Contributor Author

mrava87 commented Jun 17, 2025

I have tested it by copy-paste your update. This uncovers the bug in nccl_asarray() function when working with subcomm. I am not sure exactly where yet but it looks like current asarray() will call nccl_asarray() with self.local_shapes (which call all GPUs in the world communicator) and may mess with the subcomm.

The MPI does not have this problem because it sends the object and no buffer / array unrolling needed.

@tharittk thanks for checking! I am not sure I understand what the problem is with NCCL but I agree it may be due to self.local_shapes that is not in the subcomm but in the total comm... I suggest that I merge this as is and make a Github Issue pointing to the problem with NCCL so you can handle it in a separate PR adding also tests and @astroC86 can use these additions in his PR (which targets only MPI so far)

@mrava87 mrava87 merged commit 45bee36 into PyLops:main Jun 17, 2025
61 checks passed
@mrava87 mrava87 deleted the feat-asarraymasked branch June 17, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants