[DAGISel][ARM] Fix vector truncate combine for big-endian #118101

ostannard · 2024-11-29T15:39:09Z

This DAG combine was incorrect for big-endian targets, because it assumes that when a bitcast changes the lane width, the least-significant bits of the wider lanes are in the lower-numbered lanes of the smaller type, which is only true for little-endian.

llvmbot · 2024-11-29T15:39:43Z

@llvm/pr-subscribers-backend-arm

Author: Oliver Stannard (ostannard)

Changes

This DAG combine was incorrect for big-endian targets, because it assumes that when a bitcast changes the lane width, the least-significant bits of the wider lanes are in the lower-numbered lanes of the smaller type, which is only true for little-endian.

Full diff: https://github.com/llvm/llvm-project/pull/118101.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+4-1)
(added) llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll (+31)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 521829675ae7c3..90aa3009fb5ef0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15495,12 +15495,15 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
       unsigned BuildVecNumElts =  BuildVect.getNumOperands();
       unsigned TruncVecNumElts = VT.getVectorNumElements();
       unsigned TruncEltOffset = BuildVecNumElts / TruncVecNumElts;
+      unsigned FirstElt =
+          DAG.getDataLayout().isBigEndian() ? (TruncEltOffset - 1) : 0;
 
       assert((BuildVecNumElts % TruncVecNumElts) == 0 &&
              "Invalid number of elements");
 
       SmallVector<SDValue, 8> Opnds;
-      for (unsigned i = 0, e = BuildVecNumElts; i != e; i += TruncEltOffset)
+      for (unsigned i = FirstElt, e = BuildVecNumElts; i < e;
+           i += TruncEltOffset)
         Opnds.push_back(BuildVect.getOperand(i));
 
       return DAG.getBuildVector(VT, DL, Opnds);
diff --git a/llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll b/llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll
new file mode 100644
index 00000000000000..cdc09754d2654c
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=armebv7-unknown-none-eabihf -mattr=+neon < %s | FileCheck %s
+
+define i32 @test(i64 %arg1) "target-features"="+neon" {
+; CHECK-LABEL: test:
+; CHECK:       @ %bb.0: @ %entry
+; CHECK-NEXT:    subs r1, r1, #1
+; CHECK-NEXT:    mov r2, #0
+; CHECK-NEXT:    sbcs r0, r0, #0
+; CHECK-NEXT:    vldr s0, .LCPI0_0
+; CHECK-NEXT:    movwhs r2, #1
+; CHECK-NEXT:    cmp r2, #0
+; CHECK-NEXT:    mvnne r2, #0
+; CHECK-NEXT:    vmov s1, r2
+; CHECK-NEXT:    vmovn.i32 d16, q0
+; CHECK-NEXT:    vmovn.i16 d16, q8
+; CHECK-NEXT:    vmov.u8 r0, d16[0]
+; CHECK-NEXT:    and r0, r0, #1
+; CHECK-NEXT:    bx lr
+; CHECK-NEXT:    .p2align 2
+; CHECK-NEXT:  @ %bb.1:
+; CHECK-NEXT:  .LCPI0_0:
+; CHECK-NEXT:    .long 0xffffffff @ float NaN
+entry:
+  %insert_zero = insertelement <8 x i64> poison, i64 %arg1, i64 0
+  %splat_zero = shufflevector <8 x i64> %insert_zero, <8 x i64> poison, <8 x i32> zeroinitializer
+  %cmp_vec = icmp ule <8 x i64> <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>, %splat_zero
+  %first_cmp = extractelement <8 x i1> %cmp_vec, i32 0
+  %ext = zext i1 %first_cmp to i32
+  ret i32 %ext
+}

llvmbot · 2024-11-29T15:39:44Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Oliver Stannard (ostannard)

Changes

This DAG combine was incorrect for big-endian targets, because it assumes that when a bitcast changes the lane width, the least-significant bits of the wider lanes are in the lower-numbered lanes of the smaller type, which is only true for little-endian.

Full diff: https://github.com/llvm/llvm-project/pull/118101.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+4-1)
(added) llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll (+31)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 521829675ae7c3..90aa3009fb5ef0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15495,12 +15495,15 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
       unsigned BuildVecNumElts =  BuildVect.getNumOperands();
       unsigned TruncVecNumElts = VT.getVectorNumElements();
       unsigned TruncEltOffset = BuildVecNumElts / TruncVecNumElts;
+      unsigned FirstElt =
+          DAG.getDataLayout().isBigEndian() ? (TruncEltOffset - 1) : 0;
 
       assert((BuildVecNumElts % TruncVecNumElts) == 0 &&
              "Invalid number of elements");
 
       SmallVector<SDValue, 8> Opnds;
-      for (unsigned i = 0, e = BuildVecNumElts; i != e; i += TruncEltOffset)
+      for (unsigned i = FirstElt, e = BuildVecNumElts; i < e;
+           i += TruncEltOffset)
         Opnds.push_back(BuildVect.getOperand(i));
 
       return DAG.getBuildVector(VT, DL, Opnds);
diff --git a/llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll b/llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll
new file mode 100644
index 00000000000000..cdc09754d2654c
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=armebv7-unknown-none-eabihf -mattr=+neon < %s | FileCheck %s
+
+define i32 @test(i64 %arg1) "target-features"="+neon" {
+; CHECK-LABEL: test:
+; CHECK:       @ %bb.0: @ %entry
+; CHECK-NEXT:    subs r1, r1, #1
+; CHECK-NEXT:    mov r2, #0
+; CHECK-NEXT:    sbcs r0, r0, #0
+; CHECK-NEXT:    vldr s0, .LCPI0_0
+; CHECK-NEXT:    movwhs r2, #1
+; CHECK-NEXT:    cmp r2, #0
+; CHECK-NEXT:    mvnne r2, #0
+; CHECK-NEXT:    vmov s1, r2
+; CHECK-NEXT:    vmovn.i32 d16, q0
+; CHECK-NEXT:    vmovn.i16 d16, q8
+; CHECK-NEXT:    vmov.u8 r0, d16[0]
+; CHECK-NEXT:    and r0, r0, #1
+; CHECK-NEXT:    bx lr
+; CHECK-NEXT:    .p2align 2
+; CHECK-NEXT:  @ %bb.1:
+; CHECK-NEXT:  .LCPI0_0:
+; CHECK-NEXT:    .long 0xffffffff @ float NaN
+entry:
+  %insert_zero = insertelement <8 x i64> poison, i64 %arg1, i64 0
+  %splat_zero = shufflevector <8 x i64> %insert_zero, <8 x i64> poison, <8 x i32> zeroinitializer
+  %cmp_vec = icmp ule <8 x i64> <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>, %splat_zero
+  %first_cmp = extractelement <8 x i1> %cmp_vec, i32 0
+  %ext = zext i1 %first_cmp to i32
+  ret i32 %ext
+}

arsenm · 2024-11-29T15:41:03Z

llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll

+; RUN: llc -mtriple=armebv7-unknown-none-eabihf -mattr=+neon < %s | FileCheck %s
+
+define i32 @test(i64 %arg1) "target-features"="+neon" {


The -mattr does nothing with target-features, remove one or the other

llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll

RKSimon

LGTM with one minor

RKSimon · 2024-12-03T17:44:07Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

      unsigned TruncVecNumElts = VT.getVectorNumElements();
      unsigned TruncEltOffset = BuildVecNumElts / TruncVecNumElts;
+      unsigned FirstElt =
+          DAG.getDataLayout().isBigEndian() ? (TruncEltOffset - 1) : 0;


visitTRUNCATE already has isLE - replace this with:

unsigned FirstElt =isLE ? 0 : (TruncEltOffset - 1);

RKSimon

LGTM - cheers

ostannard added 2 commits November 29, 2024 14:43

Add test showing bug

7fb3d04

ostannard requested review from bjope and davemgreen November 29, 2024 15:39

llvmbot added backend:ARM llvm:SelectionDAG SelectionDAGISel as well labels Nov 29, 2024

arsenm reviewed Nov 29, 2024

View reviewed changes

Remove unneeded target-features attribute

804db3d

RKSimon reviewed Nov 29, 2024

View reviewed changes

llvm/test/CodeGen/ARM/big-endian-vector-trunc.ll Outdated Show resolved Hide resolved

ostannard added 2 commits December 2, 2024 10:11

Merge branch 'main' into neon-big-endian

97ad28d

Also test little-endian

fc2f59a

arsenm approved these changes Dec 3, 2024

View reviewed changes

RKSimon approved these changes Dec 3, 2024

View reviewed changes

Use existing isLE variable

3e7564b

RKSimon approved these changes Dec 4, 2024

View reviewed changes

ostannard merged commit 99b862e into llvm:main Dec 4, 2024
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DAGISel][ARM] Fix vector truncate combine for big-endian #118101

[DAGISel][ARM] Fix vector truncate combine for big-endian #118101

Uh oh!

ostannard commented Nov 29, 2024

Uh oh!

llvmbot commented Nov 29, 2024

Uh oh!

llvmbot commented Nov 29, 2024

Uh oh!

arsenm Nov 29, 2024

Uh oh!

ostannard Nov 29, 2024

Uh oh!

Uh oh!

RKSimon left a comment

Uh oh!

RKSimon Dec 3, 2024

Uh oh!

RKSimon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		; RUN: llc -mtriple=armebv7-unknown-none-eabihf -mattr=+neon < %s \| FileCheck %s

		define i32 @test(i64 %arg1) "target-features"="+neon" {

[DAGISel][ARM] Fix vector truncate combine for big-endian #118101

[DAGISel][ARM] Fix vector truncate combine for big-endian #118101

Uh oh!

Conversation

ostannard commented Nov 29, 2024

Uh oh!

llvmbot commented Nov 29, 2024

Uh oh!

llvmbot commented Nov 29, 2024

Uh oh!

arsenm Nov 29, 2024

Choose a reason for hiding this comment

Uh oh!

ostannard Nov 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

RKSimon Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants