Skip to content

Conversation

@LewisCrawford
Copy link
Contributor

Fix a check for extending loads in DAGCombiner,
where if the result type has more bits than the
loaded type it should count as an extending load.

All backends apart from AArch64 ignore this
ExtTy argument to shouldReduceLoadWidth, so this
change currently only impacts AArch64.

@llvmbot llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Oct 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-selectiondag

Author: Lewis Crawford (LewisCrawford)

Changes

Fix a check for extending loads in DAGCombiner,
where if the result type has more bits than the
loaded type it should count as an extending load.

All backends apart from AArch64 ignore this
ExtTy argument to shouldReduceLoadWidth, so this
change currently only impacts AArch64.


Full diff: https://github.com/llvm/llvm-project/pull/112182.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-1)
  • (added) llvm/test/CodeGen/AArch64/aarch64-scalarize-vec-load-ext.ll (+29)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 810ca458bc8787..2d9025da5e2e85 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -22566,7 +22566,7 @@ SDValue DAGCombiner::scalarizeExtractedVectorLoad(SDNode *EVE, EVT InVecVT,
     return SDValue();
 
   ISD::LoadExtType ExtTy =
-      ResultVT.bitsGT(VecEltVT) ? ISD::NON_EXTLOAD : ISD::EXTLOAD;
+      ResultVT.bitsGT(VecEltVT) ? ISD::EXTLOAD : ISD::NON_EXTLOAD;
   if (!TLI.isOperationLegalOrCustom(ISD::LOAD, VecEltVT) ||
       !TLI.shouldReduceLoadWidth(OriginalLoad, ExtTy, VecEltVT))
     return SDValue();
diff --git a/llvm/test/CodeGen/AArch64/aarch64-scalarize-vec-load-ext.ll b/llvm/test/CodeGen/AArch64/aarch64-scalarize-vec-load-ext.ll
new file mode 100644
index 00000000000000..f34cad0b097f55
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-scalarize-vec-load-ext.ll
@@ -0,0 +1,29 @@
+; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s | FileCheck %s
+
+; FIXME: Currently, we avoid narrowing this v4i32 load, in the
+; hopes of being able to fold the shift, despite it requiring stack
+; storage + loads. Ideally, we should narrow here and load the i32
+; directly from the variable offset e.g:
+;
+; add     x8, x0, x1, lsl #4
+; and     x9, x2, #0x3
+; ldr     w0, [x8, x9, lsl #2]
+;
+; The AArch64TargetLowering::shouldReduceLoadWidth heuristic should
+; probably be updated to choose load-narrowing instead of folding the
+; lsl in larger vector cases.
+;
+; CHECK-LABEL: narrow_load_v4_i32_single_ele_variable_idx:
+; CHECK: sub  sp, sp, #16
+; CHECK: ldr  q[[REG0:[0-9]+]], [x0, x1, lsl #4]
+; CHECK: bfi  x[[REG1:[0-9]+]], x2, #2, #2
+; CHECK: str  q[[REG0]], [sp]
+; CHECK: ldr  w0, [x[[REG1]]]
+; CHECK: add  sp, sp, #16
+define i32 @narrow_load_v4_i32_single_ele_variable_idx(ptr %ptr, i64 %off, i32 %ele) {
+entry:
+  %idx = getelementptr inbounds <4 x i32>, ptr %ptr, i64 %off
+  %x = load <4 x i32>, ptr %idx, align 8
+  %res = extractelement <4 x i32> %x, i32 %ele
+  ret i32 %res
+}

@@ -0,0 +1,29 @@
; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s | FileCheck %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge this test with whatever test was added in the original combine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And bonus points for using update_llc_test_checks.

@LewisCrawford LewisCrawford force-pushed the dagcombiner_extty_check branch from 9907b97 to 451d367 Compare October 14, 2024 16:31
Fix a check for extending loads in DAGCombiner,
where if the result type has more bits than the
loaded type it should count as an extending load.

All backends apart from AArch64 ignore this
ExtTy argument to shouldReduceLoadWidth, so this
change currently only impacts AArch64.
@LewisCrawford LewisCrawford force-pushed the dagcombiner_extty_check branch from 451d367 to ce5fcad Compare October 16, 2024 10:55
@LewisCrawford LewisCrawford merged commit f5f0076 into llvm:main Oct 16, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 16, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 11 "Add check check-libc-amdgcn-amd-amdhsa".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/7147

Here is the relevant piece of the build log for the reference
Step 11 (Add check check-libc-amdgcn-amd-amdhsa) failure: 1200 seconds without output running [b'ninja', b'-j 32', b'check-libc-amdgcn-amd-amdhsa'], attempting to kill
...
[2355/2641] Linking CXX executable libc/test/src/stdlib/libc.test.src.stdlib.strtof_test.__hermetic__.__build__
[2356/2641] Linking CXX executable libc/test/src/string/libc.test.src.string.memcmp_test.__hermetic__.__build__
[2357/2641] Linking CXX executable libc/test/src/string/libc.test.src.string.strsep_test.__hermetic__.__build__
[2358/2641] Linking CXX executable libc/test/src/inttypes/libc.test.src.inttypes.strtoimax_test.__hermetic__.__build__
[2359/2641] Linking CXX executable libc/test/src/inttypes/libc.test.src.inttypes.strtoumax_test.__hermetic__.__build__
[2360/2641] Linking CXX executable libc/test/src/string/libc.test.src.string.memmove_test.__hermetic__.__build__
[2361/2641] Linking CXX executable libc/test/src/string/libc.test.src.string.strncat_test.__hermetic__.__build__
[2362/2641] Linking CXX executable libc/test/src/time/libc.test.src.time.clock_gettime_test.__hermetic__.__build__
[2363/2641] Linking CXX executable libc/test/src/stdio/libc.test.src.stdio.asprintf_test.__hermetic__.__build__
[2364/2641] Linking CXX executable libc/test/src/stdio/libc.test.src.stdio.vasprintf_test.__hermetic__.__build__
command timed out: 1200 seconds without output running [b'ninja', b'-j 32', b'check-libc-amdgcn-amd-amdhsa'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1466.963739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants