Skip to content

Conversation

Chao1Han
Copy link
Contributor

@Copilot Copilot AI review requested due to automatic review settings October 13, 2025 05:03
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds validation to ensure P2P (point-to-point) tensors are dense, addressing a requirement that P2P operations must work with non-overlapping and dense tensors while still allowing transposed (non-contiguous) tensors.

  • Adds dense tensor validation for P2P operations in the XCCL backend
  • Introduces a new test case to verify that non-dense tensors are properly rejected
  • Updates error handling to provide specific error messages for P2P tensor requirements

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/xccl/ProcessGroupXCCL.cpp Adds dense tensor validation for P2P operations with specific error handling
test/xpu/distributed/test_c10d_xccl.py Adds test coverage for non-dense tensor rejection in send/recv operations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

pg = self._create_process_group_xccl()
device = self.rank_to_GPU[self.rank][0]
full = torch.empty((64, 64), device=device).fill_(self.rank)
# Take a slice in col dimension, making it non-dense
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions 'col dimension' but should be more precise. The slice [:, 16:32] creates a non-contiguous view by selecting columns 16-31, which results in a non-dense tensor due to the stride pattern in memory.

Suggested change
# Take a slice in col dimension, making it non-dense
# Take a slice along columns 16 to 31 (inclusive), resulting in a non-contiguous (non-dense) tensor due to the stride pattern in memory

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@dvrogozh dvrogozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants