-
Notifications
You must be signed in to change notification settings - Fork 6
Complex-number Support for NCCL & Fredholm NCCL #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tharittk very good to see support for complex numbers in the NCCL engine 🚀
I just left a few comments here and there and some about your TODO comments.
Only a few things left to do once you address them:
- try to fix the flake8 issue.
- update also https://github.com/PyLops/pylops-mpi/blob/main/docs/source/gpu.rst
- one more thing that we could add to this PR is the
mddtutorial as we have now all component for it to run with NCCL (I think...). Or we could leave it for an independent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylops_mpi/DistributedArray.py
Outdated
| if deps.nccl_enabled and getattr(self, "base_comm_nccl"): | ||
| return nccl_allgather(self.sub_comm, send_buf, recv_buf) | ||
| if hasattr(send_buf, "shape"): | ||
| send_shapes = self.base_comm.allgather(send_buf.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you sure is not self.sub_comm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, you are right. Strange enough, no test case catches that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I would think here
| xmaskedloc = arr.asarray(masked=True) |
main branch) looks correct but your previous commit should have called this pylops-mpi/pylops_mpi/DistributedArray.py
Line 378 in 1fdf083
| return nccl_asarray(self.sub_comm if masked else self.base_comm_nccl, |
.asarray call... anyways, now it is correct 😄
|
@tharittk great work! I am going to merge this. In a separate PR, we may want to consider adding tests targeting the some of the other methods in the |
I introduce the complex number support for NCCL communication here.
Changes made:
_nccl.py- Previously, the number of elements sent is equivalent to the number of elements in thesend_buf. To support the complex number, we can simply send 2x of that size (_nccl_buf_size()). The receiving data layout does not need changes as it will be according to therecv_bufi.e., it will already be complex ifsend_bufis complex.test files - I enabled all the tests. Some were muted earlier due to lack of complex-number support. Now the NCCL and MPI both have 306 tests.
Fredholm1.py- This likely needs some discussion (please see TODO: comment). Particularly, the private function_allgatheris called because Ops do not takebase_comm_ncclas an argument. Also, Fredholm operator needs some array unrolling in_rmatvecTesting: