Split IndextoOffset() into offline and online versions #2136

yucai-intel · 2025-10-07T10:14:11Z

No description provided.

Copilot

Pull Request Overview

Split IndexToOffset into compile-time (static Dims) and runtime (-1) variants to reduce nvcc compilation time and refactor all kernel call sites to use the new template form. Key changes remove the previous contiguous fast path flag and introduce dimension template parameters across many XPU SYCL kernels.

Introduced IndexToOffset<T, IndexType, Dims> and dynamic specialization with Dims = -1; removed strict/non-strict contiguous branching.
Updated all kernel usages to pass an explicit Dims (positive, -1, or new sentinel -2) and added indexing_kind template parameters in RNN kernels.
Replaced standard SYCL subgroup size attribute with Intel-specific attribute in one kernel.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/comm/TensorInfo.h	Replaced old IndexToOffset implementation with compile-time and runtime (-1) variants, removed contiguous fast path.
src/ATen/native/xpu/sycl/WeightNormKernels.cpp	Updated all IndexToOffset calls to new API (runtime -1).
src/ATen/native/xpu/sycl/TensorModeKernel.cpp	Switched subgroup size attribute to Intel-specific and updated IndexToOffset usage.
src/ATen/native/xpu/sycl/TensorApplyUtils.h	Updated ApplyOp2 to use runtime (-1) IndexToOffset.
src/ATen/native/xpu/sycl/SummaryOpsKernels.cpp	Added ADims/BDims template params and updated IndexToOffset calls.
src/ATen/native/xpu/sycl/Sorting.cpp	Pass compile-time Dim to IndexToOffset.
src/ATen/native/xpu/sycl/ScanUtils.h	Migrated to runtime (-1) IndexToOffset calls.
src/ATen/native/xpu/sycl/RNNKernels.cpp	Added indexing_kind template parameter and adjusted macros to new IndexToOffset signature.
src/ATen/native/xpu/sycl/Indexing.h	Updated offset calculations to new runtime form.
src/ATen/native/xpu/sycl/Indexing.cpp	Added DstDim/SrcDim/IdxDim template params, macros now emit various Dims (including -2) for IndexToOffset.
src/ATen/native/xpu/sycl/Dropout.cpp	Added ADims / BDims template-based offset computation.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-16T08:47:53Z

src/comm/TensorInfo.h

+template <typename T, typename IndexType, int Dims>
 struct IndexToOffset {
-  static constexpr bool STRICT_CONTIGUOUS = true;
-  static constexpr bool NON_STRICT_CONTIGUOUS = false;
-  static inline IndexType get(
+  static IndexType get(
      IndexType linearId,
-      const TensorInfo<T, IndexType>& info,
-      bool strict_contiguous = true) {
+      const TensorInfo<T, IndexType>& info) {
    IndexType offset = 0;

-    if (info.isContiguousCheckStrict(strict_contiguous)) {
-      return linearId;
-    }
-
-    for (int dim = info.dims - 1; dim > 0; --dim) {
-      IndexType curDimIndex = linearId % info.sizes[dim];
-      IndexType curDimOffset = curDimIndex * info.strides[dim];
+    // Uses static dims
+    for (int i = Dims - 1; i > 0; --i) {
+      IndexType curDimIndex = linearId % info.sizes[i];
+      IndexType curDimOffset = curDimIndex * info.strides[i];
      offset += curDimOffset;
-      linearId /= info.sizes[dim];
+      linearId /= info.sizes[i];
    }
+
    return offset + linearId * info.strides[0];
  }
 };


No safeguard enforces that runtime info.dims equals the compile-time Dims; if a caller instantiates this with Dims > info.dims, the loop will read past the valid sizes/strides arrays causing undefined behavior. Add a static_assert for Dims > 0 and a runtime (debug) assert (e.g., TORCH_INTERNAL_ASSERT(info.dims == Dims)) to prevent misuse.

Copilot · 2025-10-16T08:47:53Z

src/comm/TensorInfo.h

 template <typename T, typename IndexType>
-struct IndexToOffset<T, IndexType, true> {
-  static constexpr bool STRICT_CONTIGUOUS = true;
-  static constexpr bool NON_STRICT_CONTIGUOUS = false;
+struct IndexToOffset<T, IndexType, -1> {
  static inline IndexType get(
      IndexType linearId,
-      const TensorInfo<T, IndexType>& info,
-      bool strict_contiguous = true) {
-    return linearId;
+      const TensorInfo<T, IndexType>& info) {
+    IndexType offset = 0;
+
+    for (int i = info.dims - 1; i > 0; --i) {
+      IndexType curDimIndex = linearId % info.sizes[i];
+      IndexType curDimOffset = curDimIndex * info.strides[i];
+      offset += curDimOffset;
+      linearId /= info.sizes[i];
+    }
+
+    return offset + linearId * info.strides[0];
  }
 };


Previous implementation returned linearId immediately for (non-)strict contiguous tensors; removing that fast path forces modulus/divide per dimension for contiguous cases, increasing kernel instruction count. Consider an early check (e.g., if (info.isContiguous()) return linearId;) or a dims==1 shortcut to preserve performance.

Copilot · 2025-10-16T08:47:54Z

src/ATen/native/xpu/sycl/TensorModeKernel.cpp

 template <typename T, unsigned int Power2Size>
 struct ComputeModeKernelFunctor : public __SYCL_KER_CONFIG_CONVENTION__ {
-  [[sycl::reqd_sub_group_size(32)]] void operator()(
+  [[intel::reqd_sub_group_size(32)]] void operator()(


Replacing the standard [[sycl::reqd_sub_group_size(32)]] with vendor-specific [[intel::reqd_sub_group_size(32)]] reduces portability across SYCL implementations. Prefer the standard attribute or guard the Intel extension with preprocessor checks.

Suggested change

[[intel::reqd_sub_group_size(32)]] void operator()(

[[sycl::reqd_sub_group_size(32)]] void operator()(

Copilot · 2025-10-16T08:47:54Z

src/ATen/native/xpu/sycl/Indexing.cpp

+                  if (selfInfo.dims == 1 && sourceInfo.dims == 1 && indContig) {
+                    auto caller = SMALL_INDEX(
+                        scalar_t, index_t, unsigned int, 1, 1, -2, func_t);


The sentinel value -2 for IdxDim is undocumented and differs from the established -1 dynamic case; it implicitly relies on the primary template's loop skipping logic and produces offset = linearId * stride[0], which would be incorrect if indices_ is not 1D with stride[0]==1. Replace -2 with an explicit dimension (e.g., 1) or unify on -1 with a clear fast path, and document the intent.

yucai-intel added 15 commits October 7, 2025 18:06

Update TensorInfo.h

c58232f

Update Dropout.cpp

e504f42

Update Indexing.cpp

bd2afb2

Update Indexing.h

1dd16c0

Update RNNKernels.cpp

12c63f1

Update Sorting.cpp

3396375

Update SummaryOpsKernels.cpp

c8227fb

Update TensorApplyUtils.h

061e5cb

Update TensorModeKernel.cpp

b757a68

Update WeightNormKernels.cpp

adbbf14

Update ScanUtils.h

2499f21

Update ScanUtils.h

31578e3

Merge branch 'main' into yucai/i2o

75185f8

Merge branch 'main' into yucai/i2o

984e54d

Merge branch 'main' into yucai/i2o

9a29f58

jianyizh approved these changes Oct 15, 2025

View reviewed changes

Merge branch 'main' into yucai/i2o

409fd26

CuiYifeng requested a review from Copilot October 16, 2025 08:45

Copilot AI reviewed Oct 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split IndextoOffset() into offline and online versions #2136

Split IndextoOffset() into offline and online versions #2136

Uh oh!

yucai-intel commented Oct 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 16, 2025

Uh oh!

Copilot AI Oct 16, 2025

Uh oh!

Copilot AI Oct 16, 2025

Uh oh!

Copilot AI Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	[[intel::reqd_sub_group_size(32)]] void operator()(
	[[sycl::reqd_sub_group_size(32)]] void operator()(

Split IndextoOffset() into offline and online versions #2136

Are you sure you want to change the base?

Split IndextoOffset() into offline and online versions #2136

Uh oh!

Conversation

yucai-intel commented Oct 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants