[mlir][AMDGPU] Allow non-contiguous destination memrefs for gather_to_lds #152559

qedawkins · 2025-08-07T17:40:41Z

The requirement that the LDS operand is contiguous is overly restrictive because it's perfectly valid to have a subview depend on subgroup IDs that is still subgroup contiguous. We could continue trying to do this verification based on the number of copied elements, but instead this change just opts to clarify the semantics on the op definition.

…_lds The requirement that the LDS operand is contiguous is overly restrictive because it's perfectly valid to have a subview depend on subgroup IDs that is still subgroup contiguous. We could continue trying to do this verification based on the number of copied elements, but instead this change just opts to clarify the semantics on the op definition.

llvmbot · 2025-08-07T17:41:12Z

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-backend-amdgpu

Author: Quinn Dawkins (qedawkins)

Changes

The requirement that the LDS operand is contiguous is overly restrictive because it's perfectly valid to have a subview depend on subgroup IDs that is still subgroup contiguous. We could continue trying to do this verification based on the number of copied elements, but instead this change just opts to clarify the semantics on the op definition.

Full diff: https://github.com/llvm/llvm-project/pull/152559.diff

3 Files Affected:

(modified) mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td (+2-1)
(modified) mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp (-3)
(modified) mlir/test/Dialect/AMDGPU/ops.mlir (+3-1)

diff --git a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
index 92aacdaef4136..2c646934c11c2 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
@@ -907,7 +907,8 @@ def AMDGPU_GatherToLDSOp :
       The elements gathered by the subgroup will be written contiguously in order of lane ID
       starting at `$dst[$dstIndices]`. Byte-sized (ex. i8) or short-sized (ex. i16)
       types will be zero-padded/extended to 32 bits before being written. 96-bit types
-      (ex. vector<3xf32>) will be zero-padded to 128 bits before being written.
+      (ex. vector<3xf32>) will be zero-padded to 128 bits before being written. Only the
+      offsets held by lane 0 are used.
     * `$transferType`: type of the data to be transferred by each thread. This is used to determine
       the size of the data to be transferred and the number of threads in the subgroup.
       The transfer type must be a scalar type or a vector type with a single element type.
diff --git a/mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp b/mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp
index 9a0a230e8abca..d1ed7a00c91c6 100644
--- a/mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp
+++ b/mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp
@@ -518,9 +518,6 @@ LogicalResult GatherToLDSOp::verify() {
   MemRefType srcType = cast<MemRefType>(getSrc().getType());
   MemRefType dstType = cast<MemRefType>(getDst().getType());
 
-  if (!dstType.areTrailingDimsContiguous(dstType.getRank()))
-    return emitOpError("destination types must be contiguous");
-
   auto elemType = srcType.getElementType();
   // Check $src and $dst element types are the same.
   if (elemType != dstType.getElementType())
diff --git a/mlir/test/Dialect/AMDGPU/ops.mlir b/mlir/test/Dialect/AMDGPU/ops.mlir
index fe78b5365745a..87e11c028c62a 100644
--- a/mlir/test/Dialect/AMDGPU/ops.mlir
+++ b/mlir/test/Dialect/AMDGPU/ops.mlir
@@ -539,13 +539,15 @@ func.func @transpose_load(%idx1 : index, %idx2 : index, %mem : memref<128x32xf16
 }
 
 // CHECK-LABEL: func @gather_to_lds
-func.func @gather_to_lds(%idx1 : index, %idx2 : index, %mem1 : memref<32xf16>, %mem2 : memref<32x32xf16>, %smem1 : memref<32xf16, #gpu.address_space<workgroup>>, %smem2 : memref<32x32xf16, #gpu.address_space<workgroup>>) {
+func.func @gather_to_lds(%idx1 : index, %idx2 : index, %mem1 : memref<32xf16>, %mem2 : memref<32x32xf16>, %smem1 : memref<32xf16, #gpu.address_space<workgroup>>, %smem2 : memref<32x32xf16, #gpu.address_space<workgroup>>, %smem3 : memref<?x?xf16, strided<[?, 1]>, #gpu.address_space<workgroup>>) {
   // CHECK: amdgpu.gather_to_lds %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}}[%{{.*}}, %{{.*}}]
   // CHECK: amdgpu.gather_to_lds %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}}[%{{.*}}]
   // CHECK: amdgpu.gather_to_lds %{{.*}}[%{{.*}}],          %{{.*}}[%{{.*}}, %{{.*}}]
+  // CHECK: amdgpu.gather_to_lds %{{.*}}[%{{.*}}],          %{{.*}}[%{{.*}}, %{{.*}}]
   amdgpu.gather_to_lds %mem2[%idx1, %idx2], %smem2[%idx1, %idx2] : vector<2xf16>, memref<32x32xf16>, memref<32x32xf16, #gpu.address_space<workgroup>>
   amdgpu.gather_to_lds %mem2[%idx1, %idx2], %smem1[%idx1]        : vector<2xf16>, memref<32x32xf16>, memref<32xf16,    #gpu.address_space<workgroup>>
   amdgpu.gather_to_lds %mem1[%idx1],        %smem2[%idx1, %idx2] : vector<2xf16>, memref<32xf16>,    memref<32x32xf16, #gpu.address_space<workgroup>>
+  amdgpu.gather_to_lds %mem1[%idx1],        %smem3[%idx1, %idx2] : vector<2xf16>, memref<32xf16>,   memref<?x?xf16, strided<[?, 1]>, #gpu.address_space<workgroup>>
   func.return
 }

krzysz00

Approved

mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp

qedawkins requested review from lialan, krzysz00 and Hardcode84 August 7, 2025 17:40

llvmbot added backend:AMDGPU mlir:gpu mlir mlir:amdgpu labels Aug 7, 2025

krzysz00 approved these changes Aug 7, 2025

View reviewed changes

lialan reviewed Aug 7, 2025

View reviewed changes

mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp Outdated Show resolved Hide resolved

Restrict inner most dim

dda01c1

kuhar approved these changes Aug 7, 2025

View reviewed changes

lialan approved these changes Aug 7, 2025

View reviewed changes

qedawkins merged commit 72bc1be into llvm:main Aug 7, 2025
9 checks passed

qedawkins deleted the non_contiguous_memref branch August 7, 2025 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][AMDGPU] Allow non-contiguous destination memrefs for gather_to_lds #152559

[mlir][AMDGPU] Allow non-contiguous destination memrefs for gather_to_lds #152559

Uh oh!

qedawkins commented Aug 7, 2025

Uh oh!

llvmbot commented Aug 7, 2025 •

edited

Loading

Uh oh!

krzysz00 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[mlir][AMDGPU] Allow non-contiguous destination memrefs for gather_to_lds #152559

[mlir][AMDGPU] Allow non-contiguous destination memrefs for gather_to_lds #152559

Uh oh!

Conversation

qedawkins commented Aug 7, 2025

Uh oh!

llvmbot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Aug 7, 2025 •

edited

Loading