Skip to content

Conversation

@alextmagro
Copy link
Contributor

Requires support for rocSHMEM within pytorch for full functionality -- Able to bootstrap with PMI codechanges but not safe or recommended.

rocSHMEM also lacks stream specific functionality, and signaling. Given that Pytorch communications tend to be stream focused, this is a feature we need from rocSHMEM team.

"Clean up and finalize the NVSHMEM communication backend and free associated resources",
py::call_guard<py::gil_scoped_release>());
#else
// rocshmem functions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need these function names at python level. We can just use the nvshmem wrappers at python level.

case WaitKind::ROCSHMEM_WAIT:
// rocshmem__ulonglong_wait_until_on_stream(sig_addr,
// ROCSHMEM_CMP_EQ,
// wait_value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a warning here saying that rocshmem_wait is not supported on ROCm.

@wenchenvincent
Copy link
Collaborator

@alextmagro Could you please address the comments?
Also, please rebase/merge latest dev to incorporate fixes for sgpu tests.

@alextmagro alextmagro force-pushed the rocshmem_enablement branch 4 times, most recently from c958b2b to 53bc808 Compare November 11, 2025 19:32
@alextmagro
Copy link
Contributor Author

@alextmagro Could you please address the comments? Also, please rebase/merge latest dev to incorporate fixes for sgpu tests.

I have made the changes requested and rebased. A few issues with submodules that were added accidentally but things should be clean now -- will run level 3.

Copy link
Collaborator

@wenchenvincent wenchenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Was the CI failure due to networking issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants