[AMDGPU] Prohibit load/store merge if scale_offset is set on gfx1250 #149895

rampitec · 2025-07-21T20:23:46Z

Scaling is done on the operation size, by merging instructions we
would need to generate code to scale the offset and reset the
auto-scale bit. This is unclear if that would be beneficial, just
disable such merge for now.

rampitec · 2025-07-21T20:24:17Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-07-21T20:24:53Z

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)

Changes

Scaling is done on the operation size, by merging instructions we
would need to generate code to scale the offset and reset the
auto-scale bit. This is unclear if that would be beneficial, just
disable such merge for now.

Full diff: https://github.com/llvm/llvm-project/pull/149895.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp (+4-1)
(added) llvm/test/CodeGen/AMDGPU/load-store-opt-scale-offset.mir (+104)

diff --git a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
index 5097ac03954d5..f63cd267e736d 100644
--- a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
@@ -60,6 +60,7 @@
 #include "SILoadStoreOptimizer.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
+#include "SIDefines.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
@@ -1078,7 +1079,9 @@ bool SILoadStoreOptimizer::offsetsCanBeCombined(CombineInfo &CI,
     if (EltOffset0 + CI.Width != EltOffset1 &&
             EltOffset1 + Paired.Width != EltOffset0)
       return false;
-    if (CI.CPol != Paired.CPol)
+    // Instructions with scale_offset modifier cannot be combined unless we
+    // also generate a code to scale the offset and reset that bit.
+    if (CI.CPol != Paired.CPol || (CI.CPol & AMDGPU::CPol::SCAL))
       return false;
     if (CI.InstClass == S_LOAD_IMM || CI.InstClass == S_BUFFER_LOAD_IMM ||
         CI.InstClass == S_BUFFER_LOAD_SGPR_IMM) {
diff --git a/llvm/test/CodeGen/AMDGPU/load-store-opt-scale-offset.mir b/llvm/test/CodeGen/AMDGPU/load-store-opt-scale-offset.mir
new file mode 100644
index 0000000000000..76e2092c8b57a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/load-store-opt-scale-offset.mir
@@ -0,0 +1,104 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -run-pass=si-load-store-opt -o - %s | FileCheck -check-prefix=GCN %s
+
+---
+name:            merge_global_load_dword_2_no_scale_offset
+body:             |
+  bb.0.entry:
+
+    ; GCN-LABEL: name: merge_global_load_dword_2_no_scale_offset
+    ; GCN: [[DEF:%[0-9]+]]:sreg_64_xexec_xnull = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[GLOBAL_LOAD_DWORDX2_SADDR:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2_SADDR [[DEF]], [[DEF1]], 0, 1, implicit $exec :: (load (s64) from `ptr addrspace(1) undef` + 4, align 4, addrspace 1)
+    ; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[GLOBAL_LOAD_DWORDX2_SADDR]].sub0
+    ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY killed [[GLOBAL_LOAD_DWORDX2_SADDR]].sub1
+    ; GCN-NEXT: S_NOP 0, implicit [[DEF1]], implicit [[COPY]]
+    %0:sreg_64_xexec_xnull = IMPLICIT_DEF
+    %1:vgpr_32 = IMPLICIT_DEF
+    %2:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %0, %1, 0, 1, implicit $exec :: (load (s32) from `float addrspace(1)* undef` + 4, basealign 4, addrspace 1)
+    %3:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %0, %1, 4, 1, implicit $exec :: (load (s32) from `i32 addrspace(1)* undef` + 8, basealign 4, addrspace 1)
+    S_NOP 0, implicit %1, implicit %2
+...
+
+---
+name:            no_merge_global_load_dword_2_same_scale_offset
+body:             |
+  bb.0.entry:
+
+    ; GCN-LABEL: name: no_merge_global_load_dword_2_same_scale_offset
+    ; GCN: [[DEF:%[0-9]+]]:sreg_64_xexec_xnull = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[GLOBAL_LOAD_DWORD_SADDR:%[0-9]+]]:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR [[DEF]], [[DEF1]], 0, 2049, implicit $exec :: (load (s32) from `ptr addrspace(1) undef` + 4, addrspace 1)
+    ; GCN-NEXT: [[GLOBAL_LOAD_DWORD_SADDR1:%[0-9]+]]:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR [[DEF]], [[DEF1]], 4, 2049, implicit $exec :: (load (s32) from `ptr addrspace(1) undef` + 8, addrspace 1)
+    ; GCN-NEXT: S_NOP 0, implicit [[DEF1]], implicit [[GLOBAL_LOAD_DWORD_SADDR]]
+    %0:sreg_64_xexec_xnull = IMPLICIT_DEF
+    %1:vgpr_32 = IMPLICIT_DEF
+    %2:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %0, %1, 0, 2049, implicit $exec :: (load (s32) from `float addrspace(1)* undef` + 4, basealign 4, addrspace 1)
+    %3:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %0, %1, 4, 2049, implicit $exec :: (load (s32) from `i32 addrspace(1)* undef` + 8, basealign 4, addrspace 1)
+    S_NOP 0, implicit %1, implicit %2
+...
+
+---
+name:            no_merge_global_load_dword_2_different_scale_offset
+body:             |
+  bb.0.entry:
+
+    ; GCN-LABEL: name: no_merge_global_load_dword_2_different_scale_offset
+    ; GCN: [[DEF:%[0-9]+]]:sreg_64_xexec_xnull = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[GLOBAL_LOAD_DWORD_SADDR:%[0-9]+]]:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR [[DEF]], [[DEF1]], 0, 0, implicit $exec :: (load (s32) from `ptr addrspace(1) undef` + 4, addrspace 1)
+    ; GCN-NEXT: [[GLOBAL_LOAD_DWORD_SADDR1:%[0-9]+]]:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR [[DEF]], [[DEF1]], 4, 2048, implicit $exec :: (load (s32) from `ptr addrspace(1) undef` + 8, addrspace 1)
+    ; GCN-NEXT: S_NOP 0, implicit [[DEF1]], implicit [[GLOBAL_LOAD_DWORD_SADDR]]
+    %0:sreg_64_xexec_xnull = IMPLICIT_DEF
+    %1:vgpr_32 = IMPLICIT_DEF
+    %2:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %0, %1, 0, 0, implicit $exec :: (load (s32) from `float addrspace(1)* undef` + 4, basealign 4, addrspace 1)
+    %3:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %0, %1, 4, 2048, implicit $exec :: (load (s32) from `i32 addrspace(1)* undef` + 8, basealign 4, addrspace 1)
+    S_NOP 0, implicit %1, implicit %2
+...
+
+# NB: We do not currently support merging SGPR offset and SGPR+Imm offset forms
+# of S_LOAD, but the check stays the same: these cannot be merged with different
+# scale offsets.
+#
+# We also do not currently merge flat scratch instructions, although a common
+# check in the merge logic that CPol shall not be set for merge to happen.
+
+---
+name: merge_s_load_x1_x1_imm_no_scale_offset
+body: |
+  bb.0:
+    ; GCN-LABEL: name: merge_s_load_x1_x1_imm_no_scale_offset
+    ; GCN: [[DEF:%[0-9]+]]:sgpr_64 = IMPLICIT_DEF
+    ; GCN-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[DEF]], 0, 0 :: (dereferenceable invariant load (s64), align 4)
+    ; GCN-NEXT: [[COPY:%[0-9]+]]:sreg_32_xm0_xexec = COPY [[S_LOAD_DWORDX2_IMM]].sub0
+    ; GCN-NEXT: [[COPY1:%[0-9]+]]:sreg_32_xm0_xexec = COPY killed [[S_LOAD_DWORDX2_IMM]].sub1
+    %0:sgpr_64 = IMPLICIT_DEF
+    %1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0:sgpr_64, 0, 0 :: (dereferenceable invariant load (s32))
+    %2:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0:sgpr_64, 4, 0 :: (dereferenceable invariant load (s32))
+...
+
+---
+name: no_merge_s_load_x1_x1_imm_same_scale_offset
+body: |
+  bb.0:
+    ; GCN-LABEL: name: no_merge_s_load_x1_x1_imm_same_scale_offset
+    ; GCN: [[DEF:%[0-9]+]]:sgpr_64 = IMPLICIT_DEF
+    ; GCN-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[DEF]], 0, 2048 :: (dereferenceable invariant load (s32))
+    ; GCN-NEXT: [[S_LOAD_DWORD_IMM1:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[DEF]], 4, 2048 :: (dereferenceable invariant load (s32))
+    %0:sgpr_64 = IMPLICIT_DEF
+    %1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0:sgpr_64, 0, 2048 :: (dereferenceable invariant load (s32))
+    %2:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0:sgpr_64, 4, 2048 :: (dereferenceable invariant load (s32))
+...
+
+---
+name: no_merge_s_load_x1_x1_imm_different_scale_offset
+body: |
+  bb.0:
+    ; GCN-LABEL: name: no_merge_s_load_x1_x1_imm_different_scale_offset
+    ; GCN: [[DEF:%[0-9]+]]:sgpr_64 = IMPLICIT_DEF
+    ; GCN-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[DEF]], 0, 0 :: (dereferenceable invariant load (s32))
+    ; GCN-NEXT: [[S_LOAD_DWORD_IMM1:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[DEF]], 4, 2048 :: (dereferenceable invariant load (s32))
+    %0:sgpr_64 = IMPLICIT_DEF
+    %1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0:sgpr_64, 0, 0 :: (dereferenceable invariant load (s32))
+    %2:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %0:sgpr_64, 4, 2048 :: (dereferenceable invariant load (s32))
+...

github-actions · 2025-07-21T20:26:18Z

✅ With the latest revision this PR passed the C/C++ code formatter.

shiltian · 2025-07-21T20:28:10Z

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

clang-format is not happy with this.

Fixed. I do not follow why, though. I thought we always do it before deeper directories.

shiltian · 2025-07-21T20:31:26Z

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

Honestly, I'd make several groups here and then leave blank lines to prevent clang-format from complaining about it but it's completely up to you.

// This is always the first one #include "SILoadStoreOptimizer.h" #include "AMDGPU.h" #include "GCNSubtarget.h" #include "SIDefines.h" #include "MCTargetDesc/AMDGPUMCTargetDesc.h" // then llvm headers

SS forms of SCRATCH_LOAD_DWORD do not support SCALE_OFFSET, so if this bit is used SCRATCH_LOAD_DWORD_SADDR cannot be formed. This generally shall not happen because FI is not supposed to be scaled, but add this as a precaution.

Scaling is done on the operation size, by merging instructions we would need to generate code to scale the offset and reset the auto-scale bit. This is unclear if that would be beneficial, just disable such merge for now.

…lvm#149895) Scaling is done on the operation size, by merging instructions we would need to generate code to scale the offset and reset the auto-scale bit. This is unclear if that would be beneficial, just disable such merge for now.

This was referenced Jul 21, 2025

[AMDGPU] Verify asm VGPR alignment on gfx1250 #149880

Merged

[AMDGPU] MC support for gfx1250 scale_offset modifier #149881

Merged

[AMDGPU] Prevent folding of FI with scale_offset on gfx1250 #149894

Merged

rampitec requested review from changpeng and shiltian July 21, 2025 20:24

rampitec marked this pull request as ready for review July 21, 2025 20:24

llvmbot added the backend:AMDGPU label Jul 21, 2025

shiltian approved these changes Jul 21, 2025

View reviewed changes

shiltian reviewed Jul 21, 2025

View reviewed changes

rampitec force-pushed the users/rampitec/07-21-_amdgpu_prohibit_load_store_merge_if_scale_offset_is_set_on_gfx1250 branch from 65e43a7 to afc9b43 Compare July 21, 2025 20:28

shiltian reviewed Jul 21, 2025

View reviewed changes

changpeng approved these changes Jul 21, 2025

View reviewed changes

rampitec added 3 commits July 21, 2025 14:24

[AMDGPU] MC support for gfx1250 scale_offset modifier

d25d6ba

rampitec force-pushed the users/rampitec/07-21-_amdgpu_prevent_folding_of_fi_with_scale_offset_on_gfx1250 branch from 1e697f4 to 97a8444 Compare July 21, 2025 21:24

rampitec force-pushed the users/rampitec/07-21-_amdgpu_prohibit_load_store_merge_if_scale_offset_is_set_on_gfx1250 branch from afc9b43 to a431d67 Compare July 21, 2025 21:24

Base automatically changed from users/rampitec/07-21-_amdgpu_prevent_folding_of_fi_with_scale_offset_on_gfx1250 to main July 21, 2025 22:05

rampitec merged commit 97a66a8 into main Jul 21, 2025
12 of 15 checks passed

rampitec deleted the users/rampitec/07-21-_amdgpu_prohibit_load_store_merge_if_scale_offset_is_set_on_gfx1250 branch July 21, 2025 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Prohibit load/store merge if scale_offset is set on gfx1250 #149895

[AMDGPU] Prohibit load/store merge if scale_offset is set on gfx1250 #149895

Uh oh!

rampitec commented Jul 21, 2025

Uh oh!

rampitec commented Jul 21, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 21, 2025

Uh oh!

github-actions bot commented Jul 21, 2025 •

edited

Loading

Uh oh!

shiltian Jul 21, 2025

Uh oh!

rampitec Jul 21, 2025

Uh oh!

shiltian Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AMDGPU] Prohibit load/store merge if scale_offset is set on gfx1250 #149895

[AMDGPU] Prohibit load/store merge if scale_offset is set on gfx1250 #149895

Uh oh!

Conversation

rampitec commented Jul 21, 2025

Uh oh!

rampitec commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 21, 2025

Uh oh!

github-actions bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

rampitec Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rampitec commented Jul 21, 2025 •

edited

Loading

github-actions bot commented Jul 21, 2025 •

edited

Loading