-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LoopIdiom] Fix a DL-related crash in optimizeCRCLoop #161509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Ramkumar Ramachandra (artagnon) ChangesFull diff: https://github.com/llvm/llvm-project/pull/161509.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
index 0874b29ab7d22..b73d2efb9d318 100644
--- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -1654,7 +1654,7 @@ bool LoopIdiomRecognize::optimizeCRCLoop(const PolynomialInfo &Info) {
: LoByte(Builder, Indexer, "indexer.lo");
// Always index into a GEP using the index type.
- Indexer = Builder.CreateZExt(
+ Indexer = Builder.CreateZExtOrTrunc(
Indexer, SE->getDataLayout().getIndexType(GV->getType()),
"indexer.ext");
diff --git a/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check-dl.ll b/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check-dl.ll
new file mode 100644
index 0000000000000..691523c5107e5
--- /dev/null
+++ b/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check-dl.ll
@@ -0,0 +1,50 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 6
+; RUN: opt -passes=loop-idiom -S %s | FileCheck %s
+
+target datalayout = "p:16:16"
+
+;.
+; CHECK: @.crctable = private constant [256 x i32] zeroinitializer
+;.
+define void @test_with_dl() {
+; CHECK-LABEL: define void @test_with_dl() {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: br label %[[PH:.*]]
+; CHECK: [[PH_LOOPEXIT:.*]]:
+; CHECK-NEXT: [[CRC_NEXT_LCSSA:%.*]] = phi i32 [ [[CRC_NEXT3:%.*]], %[[LOOP:.*]] ]
+; CHECK-NEXT: br label %[[PH]]
+; CHECK: [[PH]]:
+; CHECK-NEXT: [[CRC_USE:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[CRC_NEXT_LCSSA]], %[[PH_LOOPEXIT]] ]
+; CHECK-NEXT: br label %[[LOOP]]
+; CHECK: [[LOOP]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i16 [ 0, %[[PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT: [[CRC2:%.*]] = phi i32 [ 0, %[[PH]] ], [ [[CRC_NEXT3]], %[[LOOP]] ]
+; CHECK-NEXT: [[INDEXER_LO:%.*]] = and i32 [[CRC2]], 255
+; CHECK-NEXT: [[INDEXER_EXT:%.*]] = trunc i32 [[INDEXER_LO]] to i16
+; CHECK-NEXT: [[TBL_PTRADD:%.*]] = getelementptr inbounds i32, ptr @.crctable, i16 [[INDEXER_EXT]]
+; CHECK-NEXT: [[TBL_LD:%.*]] = load i32, ptr [[TBL_PTRADD]], align 4
+; CHECK-NEXT: [[CRC_LE_SHIFT:%.*]] = lshr i32 [[CRC2]], 8
+; CHECK-NEXT: [[CRC_NEXT3]] = xor i32 [[CRC_LE_SHIFT]], [[TBL_LD]]
+; CHECK-NEXT: [[IV_NEXT]] = add i16 [[IV]], 1
+; CHECK-NEXT: [[EXIT_COND1:%.*]] = icmp ne i16 [[IV]], 0
+; CHECK-NEXT: br i1 [[EXIT_COND1]], label %[[LOOP]], label %[[PH_LOOPEXIT]]
+;
+entry:
+ br label %ph
+
+ph:
+ %crc.use = phi i32 [ 0, %entry ], [ %crc.next, %loop ]
+ br label %loop
+
+loop:
+ %iv = phi i16 [ 0, %ph ], [ %iv.next, %loop ]
+ %crc = phi i32 [ 0, %ph ], [ %crc.next, %loop ]
+ %lshr.crc.1 = lshr i32 %crc, 1
+ %crc.and.1 = and i32 %crc, 1
+ %sb.check = icmp eq i32 %crc.and.1, 0
+ %xor = xor i32 %lshr.crc.1, 0
+ %crc.next = select i1 %sb.check, i32 %lshr.crc.1, i32 %xor
+ %iv.next = add i16 %iv, 1
+ %exit.cond = icmp ult i16 %iv, 7
+ br i1 %exit.cond, label %loop, label %ph
+}
|
Thanks for the quick fix, this does solve the problem I saw. |
5590231
to
727ef42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hello, I have a question: Can HashRecognize recognize both byte-level and bit-level CRC computations? If I have designed dedicated hardware instructions for the RISC-V architecture, how can I leverage the results from HashRecognize to optimize the code by replacing software CRC computations with these hardware instructions? |
Currently, HashRecognize recognizes bit-level CRC computations and LoopIdiom optimizes it using a byte-level table-lookup. I think the pending work to enable the use of hardware instructions is to introduce a generic llvm.clmul, but nobody is working on it at the moment. Kindly feel free to pick up the task, if you're so inclined. |
Thank you for your response. You've done a fantastic job, and your answers have truly inspired me. I'm relatively new to LLVM, and my current goal is to recognize CRC loops and optimize them using my custom instructions. For example, I've already implemented a hardware instruction like CRC32C.B rd, rs1, rs2 in hardware. I can invoke this instruction through an intrinsic function, such as @llvm.riscv.crc32.b. I'd like to leverage the results from HashRecognize to achieve this optimization. Could you please give me some practical advice on how to proceed? |
Oh, it's quite simple then. Create a target-hook to query if there's a valid lowering for llvm.riscv.crc32.b (for your custom hardware, yes), and modify optimizeCRCLoop with a query for this hook: if so, simply take Info.LHS, Info.LHSAux and Info.RHS, feed them into llvm.riscv.crc32.b, delete the existing loop, and skip the existing code? You don't need to call genSarwateTable. |
No description provided.