-
Notifications
You must be signed in to change notification settings - Fork 23
Experimental rocSHMEM support #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
| "Clean up and finalize the NVSHMEM communication backend and free associated resources", | ||
| py::call_guard<py::gil_scoped_release>()); | ||
| #else | ||
| // rocshmem functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need these function names at python level. We can just use the nvshmem wrappers at python level.
| case WaitKind::ROCSHMEM_WAIT: | ||
| // rocshmem__ulonglong_wait_until_on_stream(sig_addr, | ||
| // ROCSHMEM_CMP_EQ, | ||
| // wait_value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a warning here saying that rocshmem_wait is not supported on ROCm.
|
@alextmagro Could you please address the comments? |
c958b2b to
53bc808
Compare
I have made the changes requested and rebased. A few issues with submodules that were added accidentally but things should be clean now -- will run level 3. |
wenchenvincent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Was the CI failure due to networking issue?
53bc808 to
588aebd
Compare
e98d964 to
e82a4b1
Compare
Requires support for rocSHMEM within pytorch for full functionality -- Able to bootstrap with PMI codechanges but not safe or recommended.
rocSHMEM also lacks stream specific functionality, and signaling. Given that Pytorch communications tend to be stream focused, this is a feature we need from rocSHMEM team.