-
Notifications
You must be signed in to change notification settings - Fork 60
P2P tensors must be dense #2161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds validation to ensure P2P (point-to-point) tensors are dense, addressing a requirement that P2P operations must work with non-overlapping and dense tensors while still allowing transposed (non-contiguous) tensors.
- Adds dense tensor validation for P2P operations in the XCCL backend
- Introduces a new test case to verify that non-dense tensors are properly rejected
- Updates error handling to provide specific error messages for P2P tensor requirements
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/xccl/ProcessGroupXCCL.cpp | Adds dense tensor validation for P2P operations with specific error handling |
test/xpu/distributed/test_c10d_xccl.py | Adds test coverage for non-dense tensor rejection in send/recv operations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
pg = self._create_process_group_xccl() | ||
device = self.rank_to_GPU[self.rank][0] | ||
full = torch.empty((64, 64), device=device).fill_(self.rank) | ||
# Take a slice in col dimension, making it non-dense |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions 'col dimension' but should be more precise. The slice [:, 16:32]
creates a non-contiguous view by selecting columns 16-31, which results in a non-dense tensor due to the stride pattern in memory.
# Take a slice in col dimension, making it non-dense | |
# Take a slice along columns 16 to 31 (inclusive), resulting in a non-contiguous (non-dense) tensor due to the stride pattern in memory |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
refer pytorch/pytorch@11a231ef528