-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AArch64][GlobalISel] Use TargetConstant for shift immediates #161527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-globalisel Author: David Green (davemgreen) ChangesWe represent a G_VLSHR as: This means that certain patterns, unlike SDAG, will not match on the constant. If we use the second form then the basic patterns recognizing any constant (using ImmLeaf) do not match. When we use the first form then patterns with specific constants do not match. This makes GIM_CheckLiteralInt also match on G_CONSTANT, allowing instructions with register constants to match. I don't have a strong preference if this should work some other way. (CMLT is used because it can have a higher throughput than SSHR. The others changes are to generate less instructions). Patch is 72.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161527.diff 15 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h b/llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h
index 591cf9c97ae49..4559920bf247f 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h
@@ -901,6 +901,19 @@ bool GIMatchTableExecutor::executeMatchTable(
if (MO.isCImm() && MO.getCImm()->equalsInt(Value))
break;
+ if (MO.isReg()) {
+ LLT Ty = MRI.getType(MO.getReg());
+ if (Ty.getScalarSizeInBits() > 64) {
+ if (handleReject() == RejectAndGiveUp)
+ return false;
+ break;
+ }
+
+ Value = SignExtend64(Value, Ty.getScalarSizeInBits());
+ if (isOperandImmEqual(MO, Value, MRI, /*Splat=*/true))
+ break;
+ }
+
if (handleReject() == RejectAndGiveUp)
return false;
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
index 7872c027aff2b..461a7ef67e9e0 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
@@ -177,7 +177,7 @@ define <16 x i8> @combine_vec_udiv_nonuniform4(<16 x i8> %x) {
; GISEL-NEXT: neg v2.16b, v3.16b
; GISEL-NEXT: shl v3.16b, v4.16b, #7
; GISEL-NEXT: ushl v1.16b, v1.16b, v2.16b
-; GISEL-NEXT: sshr v2.16b, v3.16b, #7
+; GISEL-NEXT: cmlt v2.16b, v3.16b, #0
; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b
; GISEL-NEXT: ret
%div = udiv <16 x i8> %x, <i8 -64, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
@@ -229,7 +229,7 @@ define <8 x i16> @pr38477(<8 x i16> %a0) {
; GISEL-NEXT: add v1.8h, v2.8h, v1.8h
; GISEL-NEXT: neg v2.8h, v4.8h
; GISEL-NEXT: ushl v1.8h, v1.8h, v2.8h
-; GISEL-NEXT: sshr v2.8h, v3.8h, #15
+; GISEL-NEXT: cmlt v2.8h, v3.8h, #0
; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b
; GISEL-NEXT: ret
%1 = udiv <8 x i16> %a0, <i16 1, i16 119, i16 73, i16 -111, i16 -3, i16 118, i16 32, i16 31>
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index cdde11042462b..63c08ddb04f7e 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -902,7 +902,7 @@ define void @sink_v8z16_0(ptr %p, ptr %d, i64 %n, <16 x i8> %a) {
; CHECK-GI-NEXT: subs x2, x2, #8
; CHECK-GI-NEXT: add x8, x8, #8
; CHECK-GI-NEXT: umull v1.8h, v1.8b, v0.8b
-; CHECK-GI-NEXT: sshr v1.8h, v1.8h, #15
+; CHECK-GI-NEXT: cmlt v1.8h, v1.8h, #0
; CHECK-GI-NEXT: xtn v1.8b, v1.8h
; CHECK-GI-NEXT: str d1, [x0], #32
; CHECK-GI-NEXT: b.ne .LBB8_1
@@ -967,8 +967,8 @@ define void @sink_v16s16_8(ptr %p, ptr %d, i64 %n, <16 x i8> %a) {
; CHECK-GI-NEXT: mov d2, v1.d[1]
; CHECK-GI-NEXT: smull v1.8h, v1.8b, v0.8b
; CHECK-GI-NEXT: smull v2.8h, v2.8b, v0.8b
-; CHECK-GI-NEXT: sshr v1.8h, v1.8h, #15
-; CHECK-GI-NEXT: sshr v2.8h, v2.8h, #15
+; CHECK-GI-NEXT: cmlt v1.8h, v1.8h, #0
+; CHECK-GI-NEXT: cmlt v2.8h, v2.8h, #0
; CHECK-GI-NEXT: uzp1 v1.16b, v1.16b, v2.16b
; CHECK-GI-NEXT: str q1, [x0], #32
; CHECK-GI-NEXT: b.ne .LBB9_1
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll b/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
index 9bafc5b8aea62..2a8b3ce2ae10b 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
@@ -999,16 +999,10 @@ entry:
}
define <8 x i8> @test_vaddhn_s16(<8 x i16> %a, <8 x i16> %b) {
-; CHECK-SD-LABEL: test_vaddhn_s16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.8b, v0.8h, v1.8h
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_s16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.8h, v0.8h, v1.8h
-; CHECK-GI-NEXT: shrn v0.8b, v0.8h, #8
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_s16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.8b, v0.8h, v1.8h
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <8 x i16> %a, %b
%vaddhn1.i = lshr <8 x i16> %vaddhn.i, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
@@ -1017,16 +1011,10 @@ entry:
}
define <4 x i16> @test_vaddhn_s32(<4 x i32> %a, <4 x i32> %b) {
-; CHECK-SD-LABEL: test_vaddhn_s32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.4h, v0.4s, v1.4s
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_s32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.4s, v0.4s, v1.4s
-; CHECK-GI-NEXT: shrn v0.4h, v0.4s, #16
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_s32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.4h, v0.4s, v1.4s
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <4 x i32> %a, %b
%vaddhn1.i = lshr <4 x i32> %vaddhn.i, <i32 16, i32 16, i32 16, i32 16>
@@ -1035,16 +1023,10 @@ entry:
}
define <2 x i32> @test_vaddhn_s64(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-SD-LABEL: test_vaddhn_s64:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.2s, v0.2d, v1.2d
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_s64:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.2d, v0.2d, v1.2d
-; CHECK-GI-NEXT: shrn v0.2s, v0.2d, #32
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_s64:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.2s, v0.2d, v1.2d
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <2 x i64> %a, %b
%vaddhn1.i = lshr <2 x i64> %vaddhn.i, <i64 32, i64 32>
@@ -1053,16 +1035,10 @@ entry:
}
define <8 x i8> @test_vaddhn_u16(<8 x i16> %a, <8 x i16> %b) {
-; CHECK-SD-LABEL: test_vaddhn_u16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.8b, v0.8h, v1.8h
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_u16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.8h, v0.8h, v1.8h
-; CHECK-GI-NEXT: shrn v0.8b, v0.8h, #8
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_u16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.8b, v0.8h, v1.8h
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <8 x i16> %a, %b
%vaddhn1.i = lshr <8 x i16> %vaddhn.i, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
@@ -1071,16 +1047,10 @@ entry:
}
define <4 x i16> @test_vaddhn_u32(<4 x i32> %a, <4 x i32> %b) {
-; CHECK-SD-LABEL: test_vaddhn_u32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.4h, v0.4s, v1.4s
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_u32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.4s, v0.4s, v1.4s
-; CHECK-GI-NEXT: shrn v0.4h, v0.4s, #16
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_u32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.4h, v0.4s, v1.4s
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <4 x i32> %a, %b
%vaddhn1.i = lshr <4 x i32> %vaddhn.i, <i32 16, i32 16, i32 16, i32 16>
@@ -1089,16 +1059,10 @@ entry:
}
define <2 x i32> @test_vaddhn_u64(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-SD-LABEL: test_vaddhn_u64:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.2s, v0.2d, v1.2d
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_u64:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.2d, v0.2d, v1.2d
-; CHECK-GI-NEXT: shrn v0.2s, v0.2d, #32
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_u64:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.2s, v0.2d, v1.2d
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <2 x i64> %a, %b
%vaddhn1.i = lshr <2 x i64> %vaddhn.i, <i64 32, i64 32>
@@ -1115,9 +1079,8 @@ define <16 x i8> @test_vaddhn_high_s16(<8 x i8> %r, <8 x i16> %a, <8 x i16> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_s16:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.8h, v1.8h, v2.8h
+; CHECK-GI-NEXT: addhn v1.8b, v1.8h, v2.8h
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.8b, v1.8h, #8
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1141,9 +1104,8 @@ define <8 x i16> @test_vaddhn_high_s32(<4 x i16> %r, <4 x i32> %a, <4 x i32> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_s32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.4s, v1.4s, v2.4s
+; CHECK-GI-NEXT: addhn v1.4h, v1.4s, v2.4s
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.4h, v1.4s, #16
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1167,9 +1129,8 @@ define <4 x i32> @test_vaddhn_high_s64(<2 x i32> %r, <2 x i64> %a, <2 x i64> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_s64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.2d, v1.2d, v2.2d
+; CHECK-GI-NEXT: addhn v1.2s, v1.2d, v2.2d
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.2s, v1.2d, #32
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1193,9 +1154,8 @@ define <16 x i8> @test_vaddhn_high_u16(<8 x i8> %r, <8 x i16> %a, <8 x i16> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_u16:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.8h, v1.8h, v2.8h
+; CHECK-GI-NEXT: addhn v1.8b, v1.8h, v2.8h
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.8b, v1.8h, #8
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1219,9 +1179,8 @@ define <8 x i16> @test_vaddhn_high_u32(<4 x i16> %r, <4 x i32> %a, <4 x i32> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_u32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.4s, v1.4s, v2.4s
+; CHECK-GI-NEXT: addhn v1.4h, v1.4s, v2.4s
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.4h, v1.4s, #16
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1245,9 +1204,8 @@ define <4 x i32> @test_vaddhn_high_u64(<2 x i32> %r, <2 x i64> %a, <2 x i64> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_u64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.2d, v1.2d, v2.2d
+; CHECK-GI-NEXT: addhn v1.2s, v1.2d, v2.2d
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.2s, v1.2d, #32
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1461,16 +1419,10 @@ entry:
}
define <8 x i8> @test_vsubhn_s16(<8 x i16> %a, <8 x i16> %b) {
-; CHECK-SD-LABEL: test_vsubhn_s16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.8b, v0.8h, v1.8h
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_s16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.8h, v0.8h, v1.8h
-; CHECK-GI-NEXT: shrn v0.8b, v0.8h, #8
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_s16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.8b, v0.8h, v1.8h
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <8 x i16> %a, %b
%vsubhn1.i = lshr <8 x i16> %vsubhn.i, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
@@ -1479,16 +1431,10 @@ entry:
}
define <4 x i16> @test_vsubhn_s32(<4 x i32> %a, <4 x i32> %b) {
-; CHECK-SD-LABEL: test_vsubhn_s32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.4h, v0.4s, v1.4s
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_s32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.4s, v0.4s, v1.4s
-; CHECK-GI-NEXT: shrn v0.4h, v0.4s, #16
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_s32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.4h, v0.4s, v1.4s
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <4 x i32> %a, %b
%vsubhn1.i = lshr <4 x i32> %vsubhn.i, <i32 16, i32 16, i32 16, i32 16>
@@ -1497,16 +1443,10 @@ entry:
}
define <2 x i32> @test_vsubhn_s64(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-SD-LABEL: test_vsubhn_s64:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.2s, v0.2d, v1.2d
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_s64:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.2d, v0.2d, v1.2d
-; CHECK-GI-NEXT: shrn v0.2s, v0.2d, #32
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_s64:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.2s, v0.2d, v1.2d
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <2 x i64> %a, %b
%vsubhn1.i = lshr <2 x i64> %vsubhn.i, <i64 32, i64 32>
@@ -1515,16 +1455,10 @@ entry:
}
define <8 x i8> @test_vsubhn_u16(<8 x i16> %a, <8 x i16> %b) {
-; CHECK-SD-LABEL: test_vsubhn_u16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.8b, v0.8h, v1.8h
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_u16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.8h, v0.8h, v1.8h
-; CHECK-GI-NEXT: shrn v0.8b, v0.8h, #8
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_u16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.8b, v0.8h, v1.8h
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <8 x i16> %a, %b
%vsubhn1.i = lshr <8 x i16> %vsubhn.i, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
@@ -1533,16 +1467,10 @@ entry:
}
define <4 x i16> @test_vsubhn_u32(<4 x i32> %a, <4 x i32> %b) {
-; CHECK-SD-LABEL: test_vsubhn_u32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.4h, v0.4s, v1.4s
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_u32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.4s, v0.4s, v1.4s
-; CHECK-GI-NEXT: shrn v0.4h, v0.4s, #16
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_u32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.4h, v0.4s, v1.4s
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <4 x i32> %a, %b
%vsubhn1.i = lshr <4 x i32> %vsubhn.i, <i32 16, i32 16, i32 16, i32 16>
@@ -1551,16 +1479,10 @@ entry:
}
define <2 x i32> @test_vsubhn_u64(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-SD-LABEL: test_vsubhn_u64:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.2s, v0.2d, v1.2d
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_u64:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.2d, v0.2d, v1.2d
-; CHECK-GI-NEXT: shrn v0.2s, v0.2d, #32
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_u64:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.2s, v0.2d, v1.2d
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <2 x i64> %a, %b
%vsubhn1.i = lshr <2 x i64> %vsubhn.i, <i64 32, i64 32>
@@ -1577,9 +1499,8 @@ define <16 x i8> @test_vsubhn_high_s16(<8 x i8> %r, <8 x i16> %a, <8 x i16> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_s16:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.8h, v1.8h, v2.8h
+; CHECK-GI-NEXT: subhn v1.8b, v1.8h, v2.8h
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.8b, v1.8h, #8
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1603,9 +1524,8 @@ define <8 x i16> @test_vsubhn_high_s32(<4 x i16> %r, <4 x i32> %a, <4 x i32> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_s32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.4s, v1.4s, v2.4s
+; CHECK-GI-NEXT: subhn v1.4h, v1.4s, v2.4s
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.4h, v1.4s, #16
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1629,9 +1549,8 @@ define <4 x i32> @test_vsubhn_high_s64(<2 x i32> %r, <2 x i64> %a, <2 x i64> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_s64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.2d, v1.2d, v2.2d
+; CHECK-GI-NEXT: subhn v1.2s, v1.2d, v2.2d
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.2s, v1.2d, #32
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1655,9 +1574,8 @@ define <16 x i8> @test_vsubhn_high_u16(<8 x i8> %r, <8 x i16> %a, <8 x i16> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_u16:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.8h, v1.8h, v2.8h
+; CHECK-GI-NEXT: subhn v1.8b, v1.8h, v2.8h
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.8b, v1.8h, #8
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1681,9 +1599,8 @@ define <8 x i16> @test_vsubhn_high_u32(<4 x i16> %r, <4 x i32> %a, <4 x i32> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_u32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.4s, v1.4s, v2.4s
+; CHECK-GI-NEXT: subhn v1.4h, v1.4s, v2.4s
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.4h, v1.4s, #16
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1707,9 +1624,8 @@ define <4 x i32> @test_vsubhn_high_u64(<2 x i32> %r, <2 x i64> %a, <2 x i64> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_u64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.2d, v1.2d, v2.2d
+; CHECK-GI-NEXT: subhn v1.2s, v1.2d, v2.2d
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.2s, v1.2d, #32
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
diff --git a/llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll b/llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll
index 84879d15de238..03e6ca1a8e146 100644
--- a/llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll
@@ -524,8 +524,8 @@ define <32 x i8> @sext_v32i1(<32 x i1> %arg) {
; CHECK-GI-NEXT: mov.b v1[15], w9
; CHECK-GI-NEXT: shl.16b v0, v0, #7
; CHECK-GI-NEXT: shl.16b v1, v1, #7
-; CHECK-GI-NEXT: sshr.16b v0, v0, #7
-; CHECK-GI-NEXT: sshr.16b v1, v1, #7
+; CHECK-GI-NEXT: cmlt.16b v0, v0, #0
+; CHECK-GI-NEXT: cmlt.16b v1, v1, #0
; CHECK-GI-NEXT: ret
%res = sext <32 x i1> %arg to <32 x i8>
ret <32 x i8> %res
@@ -934,10 +934,10 @@ define <64 x i8> @sext_v64i1(<64 x i1> %arg) {
; CHECK-GI-NEXT: shl.16b v1, v1, #7
; CHECK-GI-NEXT: shl.16b v2, v2, #7
; CHECK-GI-NEXT: shl.16b v3, v3, #7
-; CHECK-GI-NEXT: sshr.16b v0, v0, #7
-; CHECK-GI-NEXT: sshr.16b v1, v1, #7
-; CHECK-GI-NEXT: sshr.16b v2, v2, #7
-; CHECK-GI-NEXT: sshr.16b v3, v3, #7
+; CHECK-GI-NEXT: cmlt.16b v0, v0, #0
+; CHECK-GI-NEXT: cmlt.16b v1, v1, #0
+; CHECK-GI-NEXT: cmlt.16b v2, v2, #0
+; CHECK-GI-NEXT: cmlt.16b v3, v3, #0
; CHECK-GI-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-GI-NEXT: ret
%res = sext <64 x i1> %arg to <64 x i8>
diff --git a/llvm/test/CodeGen/AArch64/arm64-vabs.ll b/llvm/test/CodeGen/AArch64/arm64-vabs.ll
index c408d7fe42000..a3f4722e14406 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vabs.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vabs.ll
@@ -1914,21 +1914,13 @@ define <2 x i128> @uabd_i64(<2 x i64> %a, <2 x i64> %b) {
}
define <8 x i16> @pr88784(<8 x i8> %l0, <8 x i8> %l1, <8 x i16> %l2) {
-; CHECK-SD-LABEL: pr88784:
-; CHECK-SD: // %bb.0:
-; CHECK-SD-NEXT: usubl.8h v0, v0, v1
-; CHECK-SD-NEXT: cmlt.8h v1, v2, #0
-; CHECK-SD-NEXT: ssra.8h v0, v2, #15
-; CHECK-SD-NEXT: eor.16b v0, v1, v0
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: pr88784:
-; CHECK-GI: // %bb.0:
-; CHECK-GI-NEXT: usubl.8h v0, v0, v1
-; CHECK-GI-NEXT: sshr.8h v1, v2, #15
-; CHECK-GI-NEXT: ssra.8h v0, v2, #15
-; CHECK-GI-NEXT: eor.16b v0, v1, v0
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: pr88784:
+; CHECK: // %bb.0:
+; CHECK-NEXT: usubl.8h v0, v0, v1
+; CHECK-NEXT: cmlt.8h v1, v2, #0
+; CHECK-NEXT: ssra.8h v0, v2, #15
+; CHECK-NEXT: eor.16b v0, v1, v0
+; CHECK-NEXT: ret
%l4 = zext <8 x i8> %l0 to <8 x i16>
%l5 = ashr <8 x i16> %l2, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%l6 = zext <8 x i8> %l1 to <8 x i16>
@@ -1947,7 +1939,7 @@ define <8 x i16> @pr88784_fixed(<8 x i8> %l0, <8 x i8> %l1, <8 x i16> %l2) {
; CHECK-GI-LABEL: pr88784_fixed:
; CHECK-GI: // %bb.0:
; CHECK-GI-NEXT: usubl.8h v0, v0, v1
-; CHECK-GI-NEXT: ssh...
[truncated]
|
@llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesWe represent a G_VLSHR as: This means that certain patterns, unlike SDAG, will not match on the constant. If we use the second form then the basic patterns recognizing any constant (using ImmLeaf) do not match. When we use the first form then patterns with specific constants do not match. This makes GIM_CheckLiteralInt also match on G_CONSTANT, allowing instructions with register constants to match. I don't have a strong preference if this should work some other way. (CMLT is used because it can have a higher throughput than SSHR. The others changes are to generate less instructions). Patch is 72.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161527.diff 15 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h b/llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h
index 591cf9c97ae49..4559920bf247f 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h
@@ -901,6 +901,19 @@ bool GIMatchTableExecutor::executeMatchTable(
if (MO.isCImm() && MO.getCImm()->equalsInt(Value))
break;
+ if (MO.isReg()) {
+ LLT Ty = MRI.getType(MO.getReg());
+ if (Ty.getScalarSizeInBits() > 64) {
+ if (handleReject() == RejectAndGiveUp)
+ return false;
+ break;
+ }
+
+ Value = SignExtend64(Value, Ty.getScalarSizeInBits());
+ if (isOperandImmEqual(MO, Value, MRI, /*Splat=*/true))
+ break;
+ }
+
if (handleReject() == RejectAndGiveUp)
return false;
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
index 7872c027aff2b..461a7ef67e9e0 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
@@ -177,7 +177,7 @@ define <16 x i8> @combine_vec_udiv_nonuniform4(<16 x i8> %x) {
; GISEL-NEXT: neg v2.16b, v3.16b
; GISEL-NEXT: shl v3.16b, v4.16b, #7
; GISEL-NEXT: ushl v1.16b, v1.16b, v2.16b
-; GISEL-NEXT: sshr v2.16b, v3.16b, #7
+; GISEL-NEXT: cmlt v2.16b, v3.16b, #0
; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b
; GISEL-NEXT: ret
%div = udiv <16 x i8> %x, <i8 -64, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
@@ -229,7 +229,7 @@ define <8 x i16> @pr38477(<8 x i16> %a0) {
; GISEL-NEXT: add v1.8h, v2.8h, v1.8h
; GISEL-NEXT: neg v2.8h, v4.8h
; GISEL-NEXT: ushl v1.8h, v1.8h, v2.8h
-; GISEL-NEXT: sshr v2.8h, v3.8h, #15
+; GISEL-NEXT: cmlt v2.8h, v3.8h, #0
; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b
; GISEL-NEXT: ret
%1 = udiv <8 x i16> %a0, <i16 1, i16 119, i16 73, i16 -111, i16 -3, i16 118, i16 32, i16 31>
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index cdde11042462b..63c08ddb04f7e 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -902,7 +902,7 @@ define void @sink_v8z16_0(ptr %p, ptr %d, i64 %n, <16 x i8> %a) {
; CHECK-GI-NEXT: subs x2, x2, #8
; CHECK-GI-NEXT: add x8, x8, #8
; CHECK-GI-NEXT: umull v1.8h, v1.8b, v0.8b
-; CHECK-GI-NEXT: sshr v1.8h, v1.8h, #15
+; CHECK-GI-NEXT: cmlt v1.8h, v1.8h, #0
; CHECK-GI-NEXT: xtn v1.8b, v1.8h
; CHECK-GI-NEXT: str d1, [x0], #32
; CHECK-GI-NEXT: b.ne .LBB8_1
@@ -967,8 +967,8 @@ define void @sink_v16s16_8(ptr %p, ptr %d, i64 %n, <16 x i8> %a) {
; CHECK-GI-NEXT: mov d2, v1.d[1]
; CHECK-GI-NEXT: smull v1.8h, v1.8b, v0.8b
; CHECK-GI-NEXT: smull v2.8h, v2.8b, v0.8b
-; CHECK-GI-NEXT: sshr v1.8h, v1.8h, #15
-; CHECK-GI-NEXT: sshr v2.8h, v2.8h, #15
+; CHECK-GI-NEXT: cmlt v1.8h, v1.8h, #0
+; CHECK-GI-NEXT: cmlt v2.8h, v2.8h, #0
; CHECK-GI-NEXT: uzp1 v1.16b, v1.16b, v2.16b
; CHECK-GI-NEXT: str q1, [x0], #32
; CHECK-GI-NEXT: b.ne .LBB9_1
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll b/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
index 9bafc5b8aea62..2a8b3ce2ae10b 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
@@ -999,16 +999,10 @@ entry:
}
define <8 x i8> @test_vaddhn_s16(<8 x i16> %a, <8 x i16> %b) {
-; CHECK-SD-LABEL: test_vaddhn_s16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.8b, v0.8h, v1.8h
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_s16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.8h, v0.8h, v1.8h
-; CHECK-GI-NEXT: shrn v0.8b, v0.8h, #8
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_s16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.8b, v0.8h, v1.8h
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <8 x i16> %a, %b
%vaddhn1.i = lshr <8 x i16> %vaddhn.i, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
@@ -1017,16 +1011,10 @@ entry:
}
define <4 x i16> @test_vaddhn_s32(<4 x i32> %a, <4 x i32> %b) {
-; CHECK-SD-LABEL: test_vaddhn_s32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.4h, v0.4s, v1.4s
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_s32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.4s, v0.4s, v1.4s
-; CHECK-GI-NEXT: shrn v0.4h, v0.4s, #16
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_s32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.4h, v0.4s, v1.4s
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <4 x i32> %a, %b
%vaddhn1.i = lshr <4 x i32> %vaddhn.i, <i32 16, i32 16, i32 16, i32 16>
@@ -1035,16 +1023,10 @@ entry:
}
define <2 x i32> @test_vaddhn_s64(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-SD-LABEL: test_vaddhn_s64:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.2s, v0.2d, v1.2d
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_s64:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.2d, v0.2d, v1.2d
-; CHECK-GI-NEXT: shrn v0.2s, v0.2d, #32
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_s64:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.2s, v0.2d, v1.2d
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <2 x i64> %a, %b
%vaddhn1.i = lshr <2 x i64> %vaddhn.i, <i64 32, i64 32>
@@ -1053,16 +1035,10 @@ entry:
}
define <8 x i8> @test_vaddhn_u16(<8 x i16> %a, <8 x i16> %b) {
-; CHECK-SD-LABEL: test_vaddhn_u16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.8b, v0.8h, v1.8h
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_u16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.8h, v0.8h, v1.8h
-; CHECK-GI-NEXT: shrn v0.8b, v0.8h, #8
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_u16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.8b, v0.8h, v1.8h
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <8 x i16> %a, %b
%vaddhn1.i = lshr <8 x i16> %vaddhn.i, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
@@ -1071,16 +1047,10 @@ entry:
}
define <4 x i16> @test_vaddhn_u32(<4 x i32> %a, <4 x i32> %b) {
-; CHECK-SD-LABEL: test_vaddhn_u32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.4h, v0.4s, v1.4s
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_u32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.4s, v0.4s, v1.4s
-; CHECK-GI-NEXT: shrn v0.4h, v0.4s, #16
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_u32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.4h, v0.4s, v1.4s
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <4 x i32> %a, %b
%vaddhn1.i = lshr <4 x i32> %vaddhn.i, <i32 16, i32 16, i32 16, i32 16>
@@ -1089,16 +1059,10 @@ entry:
}
define <2 x i32> @test_vaddhn_u64(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-SD-LABEL: test_vaddhn_u64:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: addhn v0.2s, v0.2d, v1.2d
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vaddhn_u64:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v0.2d, v0.2d, v1.2d
-; CHECK-GI-NEXT: shrn v0.2s, v0.2d, #32
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vaddhn_u64:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: addhn v0.2s, v0.2d, v1.2d
+; CHECK-NEXT: ret
entry:
%vaddhn.i = add <2 x i64> %a, %b
%vaddhn1.i = lshr <2 x i64> %vaddhn.i, <i64 32, i64 32>
@@ -1115,9 +1079,8 @@ define <16 x i8> @test_vaddhn_high_s16(<8 x i8> %r, <8 x i16> %a, <8 x i16> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_s16:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.8h, v1.8h, v2.8h
+; CHECK-GI-NEXT: addhn v1.8b, v1.8h, v2.8h
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.8b, v1.8h, #8
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1141,9 +1104,8 @@ define <8 x i16> @test_vaddhn_high_s32(<4 x i16> %r, <4 x i32> %a, <4 x i32> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_s32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.4s, v1.4s, v2.4s
+; CHECK-GI-NEXT: addhn v1.4h, v1.4s, v2.4s
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.4h, v1.4s, #16
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1167,9 +1129,8 @@ define <4 x i32> @test_vaddhn_high_s64(<2 x i32> %r, <2 x i64> %a, <2 x i64> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_s64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.2d, v1.2d, v2.2d
+; CHECK-GI-NEXT: addhn v1.2s, v1.2d, v2.2d
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.2s, v1.2d, #32
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1193,9 +1154,8 @@ define <16 x i8> @test_vaddhn_high_u16(<8 x i8> %r, <8 x i16> %a, <8 x i16> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_u16:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.8h, v1.8h, v2.8h
+; CHECK-GI-NEXT: addhn v1.8b, v1.8h, v2.8h
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.8b, v1.8h, #8
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1219,9 +1179,8 @@ define <8 x i16> @test_vaddhn_high_u32(<4 x i16> %r, <4 x i32> %a, <4 x i32> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_u32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.4s, v1.4s, v2.4s
+; CHECK-GI-NEXT: addhn v1.4h, v1.4s, v2.4s
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.4h, v1.4s, #16
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1245,9 +1204,8 @@ define <4 x i32> @test_vaddhn_high_u64(<2 x i32> %r, <2 x i64> %a, <2 x i64> %b)
;
; CHECK-GI-LABEL: test_vaddhn_high_u64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: add v1.2d, v1.2d, v2.2d
+; CHECK-GI-NEXT: addhn v1.2s, v1.2d, v2.2d
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.2s, v1.2d, #32
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1461,16 +1419,10 @@ entry:
}
define <8 x i8> @test_vsubhn_s16(<8 x i16> %a, <8 x i16> %b) {
-; CHECK-SD-LABEL: test_vsubhn_s16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.8b, v0.8h, v1.8h
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_s16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.8h, v0.8h, v1.8h
-; CHECK-GI-NEXT: shrn v0.8b, v0.8h, #8
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_s16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.8b, v0.8h, v1.8h
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <8 x i16> %a, %b
%vsubhn1.i = lshr <8 x i16> %vsubhn.i, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
@@ -1479,16 +1431,10 @@ entry:
}
define <4 x i16> @test_vsubhn_s32(<4 x i32> %a, <4 x i32> %b) {
-; CHECK-SD-LABEL: test_vsubhn_s32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.4h, v0.4s, v1.4s
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_s32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.4s, v0.4s, v1.4s
-; CHECK-GI-NEXT: shrn v0.4h, v0.4s, #16
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_s32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.4h, v0.4s, v1.4s
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <4 x i32> %a, %b
%vsubhn1.i = lshr <4 x i32> %vsubhn.i, <i32 16, i32 16, i32 16, i32 16>
@@ -1497,16 +1443,10 @@ entry:
}
define <2 x i32> @test_vsubhn_s64(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-SD-LABEL: test_vsubhn_s64:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.2s, v0.2d, v1.2d
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_s64:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.2d, v0.2d, v1.2d
-; CHECK-GI-NEXT: shrn v0.2s, v0.2d, #32
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_s64:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.2s, v0.2d, v1.2d
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <2 x i64> %a, %b
%vsubhn1.i = lshr <2 x i64> %vsubhn.i, <i64 32, i64 32>
@@ -1515,16 +1455,10 @@ entry:
}
define <8 x i8> @test_vsubhn_u16(<8 x i16> %a, <8 x i16> %b) {
-; CHECK-SD-LABEL: test_vsubhn_u16:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.8b, v0.8h, v1.8h
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_u16:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.8h, v0.8h, v1.8h
-; CHECK-GI-NEXT: shrn v0.8b, v0.8h, #8
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_u16:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.8b, v0.8h, v1.8h
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <8 x i16> %a, %b
%vsubhn1.i = lshr <8 x i16> %vsubhn.i, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
@@ -1533,16 +1467,10 @@ entry:
}
define <4 x i16> @test_vsubhn_u32(<4 x i32> %a, <4 x i32> %b) {
-; CHECK-SD-LABEL: test_vsubhn_u32:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.4h, v0.4s, v1.4s
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_u32:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.4s, v0.4s, v1.4s
-; CHECK-GI-NEXT: shrn v0.4h, v0.4s, #16
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_u32:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.4h, v0.4s, v1.4s
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <4 x i32> %a, %b
%vsubhn1.i = lshr <4 x i32> %vsubhn.i, <i32 16, i32 16, i32 16, i32 16>
@@ -1551,16 +1479,10 @@ entry:
}
define <2 x i32> @test_vsubhn_u64(<2 x i64> %a, <2 x i64> %b) {
-; CHECK-SD-LABEL: test_vsubhn_u64:
-; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: subhn v0.2s, v0.2d, v1.2d
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: test_vsubhn_u64:
-; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v0.2d, v0.2d, v1.2d
-; CHECK-GI-NEXT: shrn v0.2s, v0.2d, #32
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: test_vsubhn_u64:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: subhn v0.2s, v0.2d, v1.2d
+; CHECK-NEXT: ret
entry:
%vsubhn.i = sub <2 x i64> %a, %b
%vsubhn1.i = lshr <2 x i64> %vsubhn.i, <i64 32, i64 32>
@@ -1577,9 +1499,8 @@ define <16 x i8> @test_vsubhn_high_s16(<8 x i8> %r, <8 x i16> %a, <8 x i16> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_s16:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.8h, v1.8h, v2.8h
+; CHECK-GI-NEXT: subhn v1.8b, v1.8h, v2.8h
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.8b, v1.8h, #8
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1603,9 +1524,8 @@ define <8 x i16> @test_vsubhn_high_s32(<4 x i16> %r, <4 x i32> %a, <4 x i32> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_s32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.4s, v1.4s, v2.4s
+; CHECK-GI-NEXT: subhn v1.4h, v1.4s, v2.4s
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.4h, v1.4s, #16
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1629,9 +1549,8 @@ define <4 x i32> @test_vsubhn_high_s64(<2 x i32> %r, <2 x i64> %a, <2 x i64> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_s64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.2d, v1.2d, v2.2d
+; CHECK-GI-NEXT: subhn v1.2s, v1.2d, v2.2d
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.2s, v1.2d, #32
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1655,9 +1574,8 @@ define <16 x i8> @test_vsubhn_high_u16(<8 x i8> %r, <8 x i16> %a, <8 x i16> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_u16:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.8h, v1.8h, v2.8h
+; CHECK-GI-NEXT: subhn v1.8b, v1.8h, v2.8h
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.8b, v1.8h, #8
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1681,9 +1599,8 @@ define <8 x i16> @test_vsubhn_high_u32(<4 x i16> %r, <4 x i32> %a, <4 x i32> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_u32:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.4s, v1.4s, v2.4s
+; CHECK-GI-NEXT: subhn v1.4h, v1.4s, v2.4s
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.4h, v1.4s, #16
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
@@ -1707,9 +1624,8 @@ define <4 x i32> @test_vsubhn_high_u64(<2 x i32> %r, <2 x i64> %a, <2 x i64> %b)
;
; CHECK-GI-LABEL: test_vsubhn_high_u64:
; CHECK-GI: // %bb.0: // %entry
-; CHECK-GI-NEXT: sub v1.2d, v1.2d, v2.2d
+; CHECK-GI-NEXT: subhn v1.2s, v1.2d, v2.2d
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
-; CHECK-GI-NEXT: shrn v1.2s, v1.2d, #32
; CHECK-GI-NEXT: fmov x8, d1
; CHECK-GI-NEXT: mov v0.d[1], x8
; CHECK-GI-NEXT: ret
diff --git a/llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll b/llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll
index 84879d15de238..03e6ca1a8e146 100644
--- a/llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll
@@ -524,8 +524,8 @@ define <32 x i8> @sext_v32i1(<32 x i1> %arg) {
; CHECK-GI-NEXT: mov.b v1[15], w9
; CHECK-GI-NEXT: shl.16b v0, v0, #7
; CHECK-GI-NEXT: shl.16b v1, v1, #7
-; CHECK-GI-NEXT: sshr.16b v0, v0, #7
-; CHECK-GI-NEXT: sshr.16b v1, v1, #7
+; CHECK-GI-NEXT: cmlt.16b v0, v0, #0
+; CHECK-GI-NEXT: cmlt.16b v1, v1, #0
; CHECK-GI-NEXT: ret
%res = sext <32 x i1> %arg to <32 x i8>
ret <32 x i8> %res
@@ -934,10 +934,10 @@ define <64 x i8> @sext_v64i1(<64 x i1> %arg) {
; CHECK-GI-NEXT: shl.16b v1, v1, #7
; CHECK-GI-NEXT: shl.16b v2, v2, #7
; CHECK-GI-NEXT: shl.16b v3, v3, #7
-; CHECK-GI-NEXT: sshr.16b v0, v0, #7
-; CHECK-GI-NEXT: sshr.16b v1, v1, #7
-; CHECK-GI-NEXT: sshr.16b v2, v2, #7
-; CHECK-GI-NEXT: sshr.16b v3, v3, #7
+; CHECK-GI-NEXT: cmlt.16b v0, v0, #0
+; CHECK-GI-NEXT: cmlt.16b v1, v1, #0
+; CHECK-GI-NEXT: cmlt.16b v2, v2, #0
+; CHECK-GI-NEXT: cmlt.16b v3, v3, #0
; CHECK-GI-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-GI-NEXT: ret
%res = sext <64 x i1> %arg to <64 x i8>
diff --git a/llvm/test/CodeGen/AArch64/arm64-vabs.ll b/llvm/test/CodeGen/AArch64/arm64-vabs.ll
index c408d7fe42000..a3f4722e14406 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vabs.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vabs.ll
@@ -1914,21 +1914,13 @@ define <2 x i128> @uabd_i64(<2 x i64> %a, <2 x i64> %b) {
}
define <8 x i16> @pr88784(<8 x i8> %l0, <8 x i8> %l1, <8 x i16> %l2) {
-; CHECK-SD-LABEL: pr88784:
-; CHECK-SD: // %bb.0:
-; CHECK-SD-NEXT: usubl.8h v0, v0, v1
-; CHECK-SD-NEXT: cmlt.8h v1, v2, #0
-; CHECK-SD-NEXT: ssra.8h v0, v2, #15
-; CHECK-SD-NEXT: eor.16b v0, v1, v0
-; CHECK-SD-NEXT: ret
-;
-; CHECK-GI-LABEL: pr88784:
-; CHECK-GI: // %bb.0:
-; CHECK-GI-NEXT: usubl.8h v0, v0, v1
-; CHECK-GI-NEXT: sshr.8h v1, v2, #15
-; CHECK-GI-NEXT: ssra.8h v0, v2, #15
-; CHECK-GI-NEXT: eor.16b v0, v1, v0
-; CHECK-GI-NEXT: ret
+; CHECK-LABEL: pr88784:
+; CHECK: // %bb.0:
+; CHECK-NEXT: usubl.8h v0, v0, v1
+; CHECK-NEXT: cmlt.8h v1, v2, #0
+; CHECK-NEXT: ssra.8h v0, v2, #15
+; CHECK-NEXT: eor.16b v0, v1, v0
+; CHECK-NEXT: ret
%l4 = zext <8 x i8> %l0 to <8 x i16>
%l5 = ashr <8 x i16> %l2, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%l6 = zext <8 x i8> %l1 to <8 x i16>
@@ -1947,7 +1939,7 @@ define <8 x i16> @pr88784_fixed(<8 x i8> %l0, <8 x i8> %l1, <8 x i16> %l2) {
; CHECK-GI-LABEL: pr88784_fixed:
; CHECK-GI: // %bb.0:
; CHECK-GI-NEXT: usubl.8h v0, v0, v1
-; CHECK-GI-NEXT: ssh...
[truncated]
|
I don't think this should be necessary. Directly inline constants have a separate matcher, and specifically correspond to TargetConstant, not Constant.
This sounds more like a bug somewhere. DAG tablegen is unfortunately too permissive with using the wrong imm/timm matcher. We should make it stricter to help with this |
We represent a G_VLSHR as: %18:gpr(s32) = G_CONSTANT i32 16 %11:fpr(<4 x s32>) = G_VLSHR %1:fpr, %18:gpr(s32) not as an immediate operand %11:fpr(<4 x s32>) = G_VLSHR %1:fpr, 16 This means that certain patterns, unlike SDAG, will not match on the constant. If we use the second form then the basic patterns recognizing any constant (using ImmLeaf) do not match. When we use the first form then patterns with specific constants do not match. This makes GIM_CheckLiteralInt also match on G_CONSTANT, allowing patterns with specific constants to match. I don't have a strong preference if this should strongly work some other way. (CMLT is used because it can have a higher throughput than SSHR. The others changes are to generate less instructions).
Thanks this was helpful. I'm not sure if I see the reason for the split between constant and targetconstants - it feels a bit nit picky for something that doesn't buy us a lot. It is something that we can make work though, I will update the patch. |
0fb67a6
to
bed0cd8
Compare
It's very important. Constant is materialized in a register, and TargetConstant is not |
: DefaultAttrsIntrinsic<[llvm_anyint_ty], | ||
[LLVMExtendedType<0>, llvm_i32_ty], | ||
[IntrNoMem]>; | ||
[IntrNoMem, ImmArg<ArgIndex<1>>]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the instruction can work with a materialized constant value, this is regressing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They all require immediates for these intrinsics.
This changes the intrinsic definitions for shifts to use IntArg, which in turn changes how the shifts are represented in SDAG to use TargetConstant (and fixes up a number of ISel lowering places too). The vecshift immediates are changed from ImmLeaf to TImmLeaf to keep them matching the TargetConstant. On the GISel side the constant shift amounts are represented as immediate operands, not separate constants. The end result is that a few more patterns manage to match in GISel.