-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[LowerTypeTests] Generate fshr for rotate pattern #141735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The canonical representation for a rotate right is fshr with two equal arguments, so generate that instead of a lshr/shl/or sequence.
|
@llvm/pr-subscribers-lto @llvm/pr-subscribers-llvm-transforms Author: Nikita Popov (nikic) ChangesThe canonical representation for a rotate right is fshr with two equal arguments, so generate that instead of a lshr/shl/or sequence. Patch is 21.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141735.diff 8 Files Affected:
diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
index 63f8a6e1b6d44..de9efc5b25b4b 100644
--- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -779,15 +779,9 @@ Value *LowerTypeTestsModule::lowerTypeTestCall(Metadata *TypeId, CallInst *CI,
// result, causing the comparison to fail if they are nonzero. The rotate
// also conveniently gives us a bit offset to use during the load from
// the bitset.
- Value *OffsetSHR =
- B.CreateLShr(PtrOffset, B.CreateZExt(TIL.AlignLog2, IntPtrTy));
- Value *OffsetSHL = B.CreateShl(
- PtrOffset, B.CreateZExt(
- ConstantExpr::getSub(
- ConstantInt::get(Int8Ty, DL.getPointerSizeInBits(0)),
- TIL.AlignLog2),
- IntPtrTy));
- Value *BitOffset = B.CreateOr(OffsetSHR, OffsetSHL);
+ Value *BitOffset = B.CreateIntrinsic(
+ IntPtrTy, Intrinsic::fshr,
+ {PtrOffset, PtrOffset, B.CreateZExt(TIL.AlignLog2, IntPtrTy)});
Value *OffsetInRange = B.CreateICmpULE(BitOffset, TIL.SizeM1);
diff --git a/llvm/test/ThinLTO/X86/cfi-devirt.ll b/llvm/test/ThinLTO/X86/cfi-devirt.ll
index 70b0ba2f8faa5..0a9935b036b5a 100644
--- a/llvm/test/ThinLTO/X86/cfi-devirt.ll
+++ b/llvm/test/ThinLTO/X86/cfi-devirt.ll
@@ -91,7 +91,7 @@ cont2:
; CHECK-IR: br i1 {{.*}}, label %trap, label %cont2
; We still have to call it as virtual.
- ; CHECK-IR: %call3 = tail call i32 %7
+ ; CHECK-IR: %call3 = tail call i32 %4
%call3 = tail call i32 %5(ptr nonnull %obj, i32 %call)
ret i32 %call3
}
diff --git a/llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll b/llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll
index 5ac6d00d9afd1..c932236dffacb 100644
--- a/llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll
+++ b/llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll
@@ -42,11 +42,9 @@ define i1 @foo(ptr %p) {
; AARCH64-SAME: (ptr [[P:%.*]]) {
; AARCH64-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; AARCH64-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @.cfi.jumptable to i64)
-; AARCH64-NEXT: [[TMP3:%.*]] = lshr i64 [[TMP2]], 3
-; AARCH64-NEXT: [[TMP4:%.*]] = shl i64 [[TMP2]], 61
-; AARCH64-NEXT: [[TMP5:%.*]] = or i64 [[TMP3]], [[TMP4]]
-; AARCH64-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 1
-; AARCH64-NEXT: ret i1 [[TMP6]]
+; AARCH64-NEXT: [[TMP3:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 3)
+; AARCH64-NEXT: [[TMP4:%.*]] = icmp ule i64 [[TMP3]], 1
+; AARCH64-NEXT: ret i1 [[TMP4]]
;
;
; AARCH64: Function Attrs: naked noinline
diff --git a/llvm/test/Transforms/LowerTypeTests/function-thumb-bti.ll b/llvm/test/Transforms/LowerTypeTests/function-thumb-bti.ll
index 7f55a15466399..bd94db9e11328 100644
--- a/llvm/test/Transforms/LowerTypeTests/function-thumb-bti.ll
+++ b/llvm/test/Transforms/LowerTypeTests/function-thumb-bti.ll
@@ -33,8 +33,8 @@ define i1 @foo(ptr %p) {
; branch instruction, 4 bytes each. For non-BTI, we shift right by 2,
; because it's just the branch.
-; BTI: lshr i64 {{.*}}, 3
-; NOBTI: lshr i64 {{.*}}, 2
+; BTI: @llvm.fshr.i64({{.*}}, i64 3)
+; NOBTI: @llvm.fshr.i64({{.*}}, i64 2)
; CHECK: define private void @.cfi.jumptable() [[ATTRS:#[0-9]+]]
diff --git a/llvm/test/Transforms/LowerTypeTests/import.ll b/llvm/test/Transforms/LowerTypeTests/import.ll
index c6566b84a4361..3c97f087d5c93 100644
--- a/llvm/test/Transforms/LowerTypeTests/import.ll
+++ b/llvm/test/Transforms/LowerTypeTests/import.ll
@@ -38,10 +38,7 @@ define i1 @allones7(ptr %p) {
; X86-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; X86-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_allones7_global_addr to i64)
; X86-NEXT: [[TMP3:%.*]] = zext i8 ptrtoint (ptr @__typeid_allones7_align to i8) to i64
-; X86-NEXT: [[TMP4:%.*]] = lshr i64 [[TMP2]], [[TMP3]]
-; X86-NEXT: [[TMP5:%.*]] = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_allones7_align to i8)) to i64
-; X86-NEXT: [[TMP6:%.*]] = shl i64 [[TMP2]], [[TMP5]]
-; X86-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], [[TMP6]]
+; X86-NEXT: [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 [[TMP3]])
; X86-NEXT: [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_allones7_size_m1 to i64)
; X86-NEXT: ret i1 [[TMP8]]
;
@@ -49,9 +46,7 @@ define i1 @allones7(ptr %p) {
; ARM-SAME: ptr [[P:%.*]]) {
; ARM-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; ARM-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_allones7_global_addr to i64)
-; ARM-NEXT: [[TMP3:%.*]] = lshr i64 [[TMP2]], 1
-; ARM-NEXT: [[TMP4:%.*]] = shl i64 [[TMP2]], 63
-; ARM-NEXT: [[TMP5:%.*]] = or i64 [[TMP3]], [[TMP4]]
+; ARM-NEXT: [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 1)
; ARM-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 42
; ARM-NEXT: ret i1 [[TMP6]]
;
@@ -65,10 +60,7 @@ define i1 @allones32(ptr %p) {
; X86-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; X86-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_allones32_global_addr to i64)
; X86-NEXT: [[TMP3:%.*]] = zext i8 ptrtoint (ptr @__typeid_allones32_align to i8) to i64
-; X86-NEXT: [[TMP4:%.*]] = lshr i64 [[TMP2]], [[TMP3]]
-; X86-NEXT: [[TMP5:%.*]] = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_allones32_align to i8)) to i64
-; X86-NEXT: [[TMP6:%.*]] = shl i64 [[TMP2]], [[TMP5]]
-; X86-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], [[TMP6]]
+; X86-NEXT: [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 [[TMP3]])
; X86-NEXT: [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_allones32_size_m1 to i64)
; X86-NEXT: ret i1 [[TMP8]]
;
@@ -76,9 +68,7 @@ define i1 @allones32(ptr %p) {
; ARM-SAME: ptr [[P:%.*]]) {
; ARM-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; ARM-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_allones32_global_addr to i64)
-; ARM-NEXT: [[TMP3:%.*]] = lshr i64 [[TMP2]], 2
-; ARM-NEXT: [[TMP4:%.*]] = shl i64 [[TMP2]], 62
-; ARM-NEXT: [[TMP5:%.*]] = or i64 [[TMP3]], [[TMP4]]
+; ARM-NEXT: [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 2)
; ARM-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 12345
; ARM-NEXT: ret i1 [[TMP6]]
;
@@ -92,19 +82,16 @@ define i1 @bytearray7(ptr %p) {
; X86-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; X86-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray7_global_addr to i64)
; X86-NEXT: [[TMP3:%.*]] = zext i8 ptrtoint (ptr @__typeid_bytearray7_align to i8) to i64
-; X86-NEXT: [[TMP4:%.*]] = lshr i64 [[TMP2]], [[TMP3]]
-; X86-NEXT: [[TMP5:%.*]] = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_bytearray7_align to i8)) to i64
-; X86-NEXT: [[TMP6:%.*]] = shl i64 [[TMP2]], [[TMP5]]
-; X86-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], [[TMP6]]
+; X86-NEXT: [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 [[TMP3]])
; X86-NEXT: [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_bytearray7_size_m1 to i64)
; X86-NEXT: br i1 [[TMP8]], label [[TMP9:%.*]], label [[TMP14:%.*]]
-; X86: 9:
+; X86: 6:
; X86-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr @__typeid_bytearray7_byte_array, i64 [[TMP7]]
; X86-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
; X86-NEXT: [[TMP12:%.*]] = and i8 [[TMP11]], ptrtoint (ptr @__typeid_bytearray7_bit_mask to i8)
; X86-NEXT: [[TMP13:%.*]] = icmp ne i8 [[TMP12]], 0
; X86-NEXT: br label [[TMP14]]
-; X86: 14:
+; X86: 11:
; X86-NEXT: [[TMP15:%.*]] = phi i1 [ false, [[TMP0:%.*]] ], [ [[TMP13]], [[TMP9]] ]
; X86-NEXT: ret i1 [[TMP15]]
;
@@ -112,18 +99,16 @@ define i1 @bytearray7(ptr %p) {
; ARM-SAME: ptr [[P:%.*]]) {
; ARM-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; ARM-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray7_global_addr to i64)
-; ARM-NEXT: [[TMP3:%.*]] = lshr i64 [[TMP2]], 3
-; ARM-NEXT: [[TMP4:%.*]] = shl i64 [[TMP2]], 61
-; ARM-NEXT: [[TMP5:%.*]] = or i64 [[TMP3]], [[TMP4]]
+; ARM-NEXT: [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 3)
; ARM-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 43
; ARM-NEXT: br i1 [[TMP6]], label [[TMP7:%.*]], label [[TMP12:%.*]]
-; ARM: 7:
+; ARM: 5:
; ARM-NEXT: [[TMP8:%.*]] = getelementptr i8, ptr @__typeid_bytearray7_byte_array, i64 [[TMP5]]
; ARM-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
; ARM-NEXT: [[TMP10:%.*]] = and i8 [[TMP9]], ptrtoint (ptr inttoptr (i64 64 to ptr) to i8)
; ARM-NEXT: [[TMP11:%.*]] = icmp ne i8 [[TMP10]], 0
; ARM-NEXT: br label [[TMP12]]
-; ARM: 12:
+; ARM: 10:
; ARM-NEXT: [[TMP13:%.*]] = phi i1 [ false, [[TMP0:%.*]] ], [ [[TMP11]], [[TMP7]] ]
; ARM-NEXT: ret i1 [[TMP13]]
;
@@ -137,19 +122,16 @@ define i1 @bytearray32(ptr %p) {
; X86-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; X86-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray32_global_addr to i64)
; X86-NEXT: [[TMP3:%.*]] = zext i8 ptrtoint (ptr @__typeid_bytearray32_align to i8) to i64
-; X86-NEXT: [[TMP4:%.*]] = lshr i64 [[TMP2]], [[TMP3]]
-; X86-NEXT: [[TMP5:%.*]] = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_bytearray32_align to i8)) to i64
-; X86-NEXT: [[TMP6:%.*]] = shl i64 [[TMP2]], [[TMP5]]
-; X86-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], [[TMP6]]
+; X86-NEXT: [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 [[TMP3]])
; X86-NEXT: [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_bytearray32_size_m1 to i64)
; X86-NEXT: br i1 [[TMP8]], label [[TMP9:%.*]], label [[TMP14:%.*]]
-; X86: 9:
+; X86: 6:
; X86-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr @__typeid_bytearray32_byte_array, i64 [[TMP7]]
; X86-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
; X86-NEXT: [[TMP12:%.*]] = and i8 [[TMP11]], ptrtoint (ptr @__typeid_bytearray32_bit_mask to i8)
; X86-NEXT: [[TMP13:%.*]] = icmp ne i8 [[TMP12]], 0
; X86-NEXT: br label [[TMP14]]
-; X86: 14:
+; X86: 11:
; X86-NEXT: [[TMP15:%.*]] = phi i1 [ false, [[TMP0:%.*]] ], [ [[TMP13]], [[TMP9]] ]
; X86-NEXT: ret i1 [[TMP15]]
;
@@ -157,18 +139,16 @@ define i1 @bytearray32(ptr %p) {
; ARM-SAME: ptr [[P:%.*]]) {
; ARM-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; ARM-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray32_global_addr to i64)
-; ARM-NEXT: [[TMP3:%.*]] = lshr i64 [[TMP2]], 4
-; ARM-NEXT: [[TMP4:%.*]] = shl i64 [[TMP2]], 60
-; ARM-NEXT: [[TMP5:%.*]] = or i64 [[TMP3]], [[TMP4]]
+; ARM-NEXT: [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 4)
; ARM-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 12346
; ARM-NEXT: br i1 [[TMP6]], label [[TMP7:%.*]], label [[TMP12:%.*]]
-; ARM: 7:
+; ARM: 5:
; ARM-NEXT: [[TMP8:%.*]] = getelementptr i8, ptr @__typeid_bytearray32_byte_array, i64 [[TMP5]]
; ARM-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
; ARM-NEXT: [[TMP10:%.*]] = and i8 [[TMP9]], ptrtoint (ptr inttoptr (i64 128 to ptr) to i8)
; ARM-NEXT: [[TMP11:%.*]] = icmp ne i8 [[TMP10]], 0
; ARM-NEXT: br label [[TMP12]]
-; ARM: 12:
+; ARM: 10:
; ARM-NEXT: [[TMP13:%.*]] = phi i1 [ false, [[TMP0:%.*]] ], [ [[TMP11]], [[TMP7]] ]
; ARM-NEXT: ret i1 [[TMP13]]
;
@@ -182,20 +162,17 @@ define i1 @inline5(ptr %p) {
; X86-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; X86-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_inline5_global_addr to i64)
; X86-NEXT: [[TMP3:%.*]] = zext i8 ptrtoint (ptr @__typeid_inline5_align to i8) to i64
-; X86-NEXT: [[TMP4:%.*]] = lshr i64 [[TMP2]], [[TMP3]]
-; X86-NEXT: [[TMP5:%.*]] = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_inline5_align to i8)) to i64
-; X86-NEXT: [[TMP6:%.*]] = shl i64 [[TMP2]], [[TMP5]]
-; X86-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], [[TMP6]]
+; X86-NEXT: [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 [[TMP3]])
; X86-NEXT: [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_inline5_size_m1 to i64)
; X86-NEXT: br i1 [[TMP8]], label [[TMP9:%.*]], label [[TMP15:%.*]]
-; X86: 9:
+; X86: 6:
; X86-NEXT: [[TMP10:%.*]] = trunc i64 [[TMP7]] to i32
; X86-NEXT: [[TMP11:%.*]] = and i32 [[TMP10]], 31
; X86-NEXT: [[TMP12:%.*]] = shl i32 1, [[TMP11]]
; X86-NEXT: [[TMP13:%.*]] = and i32 ptrtoint (ptr @__typeid_inline5_inline_bits to i32), [[TMP12]]
; X86-NEXT: [[TMP14:%.*]] = icmp ne i32 [[TMP13]], 0
; X86-NEXT: br label [[TMP15]]
-; X86: 15:
+; X86: 12:
; X86-NEXT: [[TMP16:%.*]] = phi i1 [ false, [[TMP0:%.*]] ], [ [[TMP14]], [[TMP9]] ]
; X86-NEXT: ret i1 [[TMP16]]
;
@@ -203,19 +180,17 @@ define i1 @inline5(ptr %p) {
; ARM-SAME: ptr [[P:%.*]]) {
; ARM-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; ARM-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_inline5_global_addr to i64)
-; ARM-NEXT: [[TMP3:%.*]] = lshr i64 [[TMP2]], 5
-; ARM-NEXT: [[TMP4:%.*]] = shl i64 [[TMP2]], 59
-; ARM-NEXT: [[TMP5:%.*]] = or i64 [[TMP3]], [[TMP4]]
+; ARM-NEXT: [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 5)
; ARM-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 31
; ARM-NEXT: br i1 [[TMP6]], label [[TMP7:%.*]], label [[TMP13:%.*]]
-; ARM: 7:
+; ARM: 5:
; ARM-NEXT: [[TMP8:%.*]] = trunc i64 [[TMP5]] to i32
; ARM-NEXT: [[TMP9:%.*]] = and i32 [[TMP8]], 31
; ARM-NEXT: [[TMP10:%.*]] = shl i32 1, [[TMP9]]
; ARM-NEXT: [[TMP11:%.*]] = and i32 123, [[TMP10]]
; ARM-NEXT: [[TMP12:%.*]] = icmp ne i32 [[TMP11]], 0
; ARM-NEXT: br label [[TMP13]]
-; ARM: 13:
+; ARM: 11:
; ARM-NEXT: [[TMP14:%.*]] = phi i1 [ false, [[TMP0:%.*]] ], [ [[TMP12]], [[TMP7]] ]
; ARM-NEXT: ret i1 [[TMP14]]
;
@@ -229,19 +204,16 @@ define i1 @inline6(ptr %p) {
; X86-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; X86-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_inline6_global_addr to i64)
; X86-NEXT: [[TMP3:%.*]] = zext i8 ptrtoint (ptr @__typeid_inline6_align to i8) to i64
-; X86-NEXT: [[TMP4:%.*]] = lshr i64 [[TMP2]], [[TMP3]]
-; X86-NEXT: [[TMP5:%.*]] = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_inline6_align to i8)) to i64
-; X86-NEXT: [[TMP6:%.*]] = shl i64 [[TMP2]], [[TMP5]]
-; X86-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], [[TMP6]]
+; X86-NEXT: [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 [[TMP3]])
; X86-NEXT: [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_inline6_size_m1 to i64)
; X86-NEXT: br i1 [[TMP8]], label [[TMP9:%.*]], label [[TMP14:%.*]]
-; X86: 9:
+; X86: 6:
; X86-NEXT: [[TMP10:%.*]] = and i64 [[TMP7]], 63
; X86-NEXT: [[TMP11:%.*]] = shl i64 1, [[TMP10]]
; X86-NEXT: [[TMP12:%.*]] = and i64 ptrtoint (ptr @__typeid_inline6_inline_bits to i64), [[TMP11]]
; X86-NEXT: [[TMP13:%.*]] = icmp ne i64 [[TMP12]], 0
; X86-NEXT: br label [[TMP14]]
-; X86: 14:
+; X86: 11:
; X86-NEXT: [[TMP15:%.*]] = phi i1 [ false, [[TMP0:%.*]] ], [ [[TMP13]], [[TMP9]] ]
; X86-NEXT: ret i1 [[TMP15]]
;
@@ -249,18 +221,16 @@ define i1 @inline6(ptr %p) {
; ARM-SAME: ptr [[P:%.*]]) {
; ARM-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; ARM-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_inline6_global_addr to i64)
-; ARM-NEXT: [[TMP3:%.*]] = lshr i64 [[TMP2]], 6
-; ARM-NEXT: [[TMP4:%.*]] = shl i64 [[TMP2]], 58
-; ARM-NEXT: [[TMP5:%.*]] = or i64 [[TMP3]], [[TMP4]]
+; ARM-NEXT: [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 6)
; ARM-NEXT: [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 63
; ARM-NEXT: br i1 [[TMP6]], label [[TMP7:%.*]], label [[TMP12:%.*]]
-; ARM: 7:
+; ARM: 5:
; ARM-NEXT: [[TMP8:%.*]] = and i64 [[TMP5]], 63
; ARM-NEXT: [[TMP9:%.*]] = shl i64 1, [[TMP8]]
; ARM-NEXT: [[TMP10:%.*]] = and i64 1000000000000, [[TMP9]]
; ARM-NEXT: [[TMP11:%.*]] = icmp ne i64 [[TMP10]], 0
; ARM-NEXT: br label [[TMP12]]
-; ARM: 12:
+; ARM: 10:
; ARM-NEXT: [[TMP13:%.*]] = phi i1 [ false, [[TMP0:%.*]] ], [ [[TMP11]], [[TMP7]] ]
; ARM-NEXT: ret i1 [[TMP13]]
;
diff --git a/llvm/test/Transforms/LowerTypeTests/simple.ll b/llvm/test/Transforms/LowerTypeTests/simple.ll
index c7ba3777b25d8..94617bab798a0 100644
--- a/llvm/test/Transforms/LowerTypeTests/simple.ll
+++ b/llvm/test/Transforms/LowerTypeTests/simple.ll
@@ -50,9 +50,7 @@ define i1 @foo(ptr %p) {
; CHECK: [[R1:%[^ ]*]] = ptrtoint ptr %p to i32
; CHECK: [[R2:%[^ ]*]] = sub i32 [[R1]], ptrtoint (ptr [[G]] to i32)
- ; CHECK: [[R3:%[^ ]*]] = lshr i32 [[R2]], 2
- ; CHECK: [[R4:%[^ ]*]] = shl i32 [[R2]], 30
- ; CHECK: [[R5:%[^ ]*]] = or i32 [[R3]], [[R4]]
+ ; CHECK: [[R5:%[^ ]*]] = call i32 @llvm.fshr.i32(i32 [[R2]], i32 [[R2]], i32 2)
; CHECK: [[R6:%[^ ]*]] = icmp ule i32 [[R5]], 67
; CHECK: br i1 [[R6]]
@@ -75,9 +73,7 @@ define i1 @foo(ptr %p) {
define i1 @bar(ptr %p) {
; CHECK: [[S1:%[^ ]*]] = ptrtoint ptr %p to i32
; CHECK: [[S2:%[^ ]*]] = sub i32 [[S1]], ptrtoint (ptr getelementptr (i8, ptr [[G]], i32 4) to i32)
- ; CHECK: [[S3:%[^ ]*]] = lshr i32 [[S2]], 8
- ; CHECK: [[S4:%[^ ]*]] = shl i32 [[S2]], 24
- ; CHECK: [[S5:%[^ ]*]] = or i32 [[S3]], [[S4]]
+ ; CHECK: [[S5:%[^ ]*]] = call i32 @llvm.fshr.i32(i32 [[S2]], i32 [[S2]], i32 8)
; CHECK: [[S6:%[^ ]*]] = icmp ule i32 [[S5]], 1
%x = call i1 @llvm.type.test(ptr %p, metadata !"typeid2")
@@ -89,9 +85,7 @@ define i1 @bar(ptr %p) {
define i1 @baz(ptr %p) {
; CHECK: [[T1:%[^ ]*]] = ptrtoint ptr %p to i32
; CHECK: [[T2:%[^ ]*]] = sub i32 [[T1]], ptrtoint (ptr [[G]] to i32)
- ; CHECK: [[T3:%[^ ]*]] = lshr i32 [[T2]], 2
- ; CHECK: [[T4:%[^ ]*]] = shl i32 [[T2]], 30
- ; CHECK: [[T5:%[^ ]*]] = or i32 [[T3]], [[T4]]
+ ; CHECK: [[T5:%[^ ]*]] = call i32 @llvm.fshr.i32(i32 [[T2]], i32 [[T2]], i32 2)
; CHECK: [[T6:%[^ ]*]] = icmp ule i32 [[T5]], 65
; CHECK: br i1 [[T6]]
diff --git a/llvm/test/Transforms/LowerTypeTests/simplify.ll b/llvm/test/Transforms/LowerTypeTests/simplify.ll
index ff0f941eece99..5f2140caca274 100644
--- a/llvm/test/Transforms/LowerTypeTests/simplify.ll
+++ b/llvm/test/Transforms/LowerTypeTests/simplify.ll
@@ -12,13 +12,10 @@ define i1 @bytearray7(ptr %p) {
; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
; CHECK-NEXT: [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray7_global_addr to i64)
; CHECK-NEXT: [[TMP3:%.*]] = zext i8 ptrtoint (ptr @__typeid_bytearray7_align to i8) to i64
-; CHECK-NEXT: [[TMP4:%.*]] = lshr i64 [[TMP2]], [[TMP3]]
-; CHECK-NEXT: [[TMP5:%.*]] = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_bytearray7_align to i8)) to i64
-; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP2]], [[TMP5]]
-; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP4]], [[TMP6]]
+; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 [[TMP3]])
; CHECK-NEXT: [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_bytearray7_size_m1 to i64)
; CHECK-NEXT: br i1 [[TMP8]], label [[TMP9:%.*]], label [[F:%.*]]
-; CHECK: 9:
+; CHECK: 6:
; CHECK-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr @__typeid_bytearray7_byte_array, i64 [[TMP7]]
; CHECK-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
; CHECK-NEXT: [[TMP12:%.*]] = and i8 [[TMP11]], ptrtoint (ptr @__typeid_bytearray7_bit_mask to i8)
diff --git a/llvm/test/Transforms/MergeFunc/cfi-thunk-merging.ll b/llvm/test/Transforms/MergeFunc/cfi-thunk-merging.ll
index f4225f95538a0..bb4b2ccacd5ed 100644
--- a/llvm/test/Transforms/MergeFunc/cfi-thunk-merging.ll
+++ b/llvm/test/Transforms/MergeFunc/cfi-thunk-merging.ll
@@ -182,17 +182,15 @@ attributes #3 = { noreturn nounwind }
; LOWERTYPETESTS-NEXT: [[TMP2:%.*]] = load ptr, ptr [[FP]], align 8
; LOWERTYPETESTS-NEXT: [[TMP3:%.*]] = ptrtoint ptr [[TMP2]] to i64
; LOWERTYPETESTS-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], ptrtoint (ptr @.cfi.jumptable to i64)
-; LOWERTYPETESTS-NEXT: [[TMP5:%.*]] = lshr i64 [[TMP4]], 3
-; LOWERTYPETESTS-NEXT: [[TMP6:%.*]] = shl i64 [[TMP4]], 61
-; LOWERTYPETESTS-NEXT...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I tested your change on a large internal binary built with CFI. It makes the binary larger by about 0.5% but I think that may just be due to the inliner activating more often because the check has fewer LLVM instructions (I didn't spot any issues in the generated code).
I also have a change on top of this one to replace zext(ptrtoint to i8) to i64 with just ptrtoint to i64 which seems to fix the #141326 codegen problems entirely. I'll upload it once this change is submitted because I don't think I can upload a stack that includes someone else's change.
|
In the LowerTypeTests pass we used to create IR like this: %3 = zext i8 ptrtoint (ptr @__typeid_allones7_align to i8) to i64 %4 = lshr i64 %2, %3 %5 = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_allones7_align to i8)) to i64 %6 = shl i64 %2, %5 %7 = or i64 %4, %6 This is because when this code was originally written there were no funnel shifts and as I recall it was necessary to create an i8 and zext to pointer width (instead of just having a ptrtoint of pointer width) in order for the shl/shr/or to be pattern matched to ror. At the time this caused no problems because there existed a zext ConstantExpr. But after zext ConstantExpr was removed in #71040, the newly present zext instruction can prevent pattern matching the rotate, for example if the zext gets hoisted to a loop preheader or common ancestor of the check. LowerTypeTests was made to use fshr in #141735 so now we can ptrtoint to pointer width and stop creating the zext. Reviewers: fmayer, nikic Reviewed By: nikic Pull Request: #142886
The canonical representation for a rotate right is fshr with two equal arguments, so generate that instead of a lshr/shl/or sequence.