Skip to content

Conversation

artagnon
Copy link
Contributor

@artagnon artagnon commented Oct 1, 2025

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Oct 1, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/161509.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp (+1-1)
  • (added) llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check-dl.ll (+50)
diff --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
index 0874b29ab7d22..b73d2efb9d318 100644
--- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -1654,7 +1654,7 @@ bool LoopIdiomRecognize::optimizeCRCLoop(const PolynomialInfo &Info) {
                                     : LoByte(Builder, Indexer, "indexer.lo");
 
     // Always index into a GEP using the index type.
-    Indexer = Builder.CreateZExt(
+    Indexer = Builder.CreateZExtOrTrunc(
         Indexer, SE->getDataLayout().getIndexType(GV->getType()),
         "indexer.ext");
 
diff --git a/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check-dl.ll b/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check-dl.ll
new file mode 100644
index 0000000000000..691523c5107e5
--- /dev/null
+++ b/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check-dl.ll
@@ -0,0 +1,50 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 6
+; RUN: opt -passes=loop-idiom -S %s | FileCheck %s
+
+target datalayout = "p:16:16"
+
+;.
+; CHECK: @.crctable = private constant [256 x i32] zeroinitializer
+;.
+define void @test_with_dl() {
+; CHECK-LABEL: define void @test_with_dl() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[PH:.*]]
+; CHECK:       [[PH_LOOPEXIT:.*]]:
+; CHECK-NEXT:    [[CRC_NEXT_LCSSA:%.*]] = phi i32 [ [[CRC_NEXT3:%.*]], %[[LOOP:.*]] ]
+; CHECK-NEXT:    br label %[[PH]]
+; CHECK:       [[PH]]:
+; CHECK-NEXT:    [[CRC_USE:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[CRC_NEXT_LCSSA]], %[[PH_LOOPEXIT]] ]
+; CHECK-NEXT:    br label %[[LOOP]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i16 [ 0, %[[PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[CRC2:%.*]] = phi i32 [ 0, %[[PH]] ], [ [[CRC_NEXT3]], %[[LOOP]] ]
+; CHECK-NEXT:    [[INDEXER_LO:%.*]] = and i32 [[CRC2]], 255
+; CHECK-NEXT:    [[INDEXER_EXT:%.*]] = trunc i32 [[INDEXER_LO]] to i16
+; CHECK-NEXT:    [[TBL_PTRADD:%.*]] = getelementptr inbounds i32, ptr @.crctable, i16 [[INDEXER_EXT]]
+; CHECK-NEXT:    [[TBL_LD:%.*]] = load i32, ptr [[TBL_PTRADD]], align 4
+; CHECK-NEXT:    [[CRC_LE_SHIFT:%.*]] = lshr i32 [[CRC2]], 8
+; CHECK-NEXT:    [[CRC_NEXT3]] = xor i32 [[CRC_LE_SHIFT]], [[TBL_LD]]
+; CHECK-NEXT:    [[IV_NEXT]] = add i16 [[IV]], 1
+; CHECK-NEXT:    [[EXIT_COND1:%.*]] = icmp ne i16 [[IV]], 0
+; CHECK-NEXT:    br i1 [[EXIT_COND1]], label %[[LOOP]], label %[[PH_LOOPEXIT]]
+;
+entry:
+  br label %ph
+
+ph:
+  %crc.use = phi i32 [ 0, %entry ], [ %crc.next, %loop ]
+  br label %loop
+
+loop:
+  %iv = phi i16 [ 0, %ph ], [ %iv.next, %loop ]
+  %crc = phi i32 [ 0, %ph ], [ %crc.next, %loop ]
+  %lshr.crc.1 = lshr i32 %crc, 1
+  %crc.and.1 = and i32 %crc, 1
+  %sb.check = icmp eq i32 %crc.and.1, 0
+  %xor = xor i32 %lshr.crc.1, 0
+  %crc.next = select i1 %sb.check, i32 %lshr.crc.1, i32 %xor
+  %iv.next = add i16 %iv, 1
+  %exit.cond = icmp ult i16 %iv, 7
+  br i1 %exit.cond, label %loop, label %ph
+}

@mikaelholmen
Copy link
Collaborator

Thanks for the quick fix, this does solve the problem I saw.

Copy link
Contributor

@pfusik pfusik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@artagnon artagnon merged commit bb16c56 into llvm:main Oct 1, 2025
9 checks passed
@artagnon artagnon deleted the lir-crc-dl-crash branch October 1, 2025 16:14
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
@Guohua-W
Copy link

Guohua-W commented Oct 9, 2025

Hello, I have a question: Can HashRecognize recognize both byte-level and bit-level CRC computations? If I have designed dedicated hardware instructions for the RISC-V architecture, how can I leverage the results from HashRecognize to optimize the code by replacing software CRC computations with these hardware instructions?

@artagnon
Copy link
Contributor Author

artagnon commented Oct 9, 2025

Hello, I have a question: Can HashRecognize recognize both byte-level and bit-level CRC computations? If I have designed dedicated hardware instructions for the RISC-V architecture, how can I leverage the results from HashRecognize to optimize the code by replacing software CRC computations with these hardware instructions?

Currently, HashRecognize recognizes bit-level CRC computations and LoopIdiom optimizes it using a byte-level table-lookup. I think the pending work to enable the use of hardware instructions is to introduce a generic llvm.clmul, but nobody is working on it at the moment. Kindly feel free to pick up the task, if you're so inclined.

@Guohua-W
Copy link

Guohua-W commented Oct 9, 2025

Hello, I have a question: Can HashRecognize recognize both byte-level and bit-level CRC computations? If I have designed dedicated hardware instructions for the RISC-V architecture, how can I leverage the results from HashRecognize to optimize the code by replacing software CRC computations with these hardware instructions?

Currently, HashRecognize recognizes bit-level CRC computations and LoopIdiom optimizes it using a byte-level table-lookup. I think the pending work to enable the use of hardware instructions is to introduce a generic llvm.clmul, but nobody is working on it at the moment. Kindly feel free to pick up the task, if you're so inclined.

Thank you for your response. You've done a fantastic job, and your answers have truly inspired me. I'm relatively new to LLVM, and my current goal is to recognize CRC loops and optimize them using my custom instructions. For example, I've already implemented a hardware instruction like CRC32C.B rd, rs1, rs2 in hardware. I can invoke this instruction through an intrinsic function, such as @llvm.riscv.crc32.b. I'd like to leverage the results from HashRecognize to achieve this optimization. Could you please give me some practical advice on how to proceed?

@artagnon
Copy link
Contributor Author

artagnon commented Oct 9, 2025

Hello, I have a question: Can HashRecognize recognize both byte-level and bit-level CRC computations? If I have designed dedicated hardware instructions for the RISC-V architecture, how can I leverage the results from HashRecognize to optimize the code by replacing software CRC computations with these hardware instructions?

Currently, HashRecognize recognizes bit-level CRC computations and LoopIdiom optimizes it using a byte-level table-lookup. I think the pending work to enable the use of hardware instructions is to introduce a generic llvm.clmul, but nobody is working on it at the moment. Kindly feel free to pick up the task, if you're so inclined.

Thank you for your response. You've done a fantastic job, and your answers have truly inspired me. I'm relatively new to LLVM, and my current goal is to recognize CRC loops and optimize them using my custom instructions. For example, I've already implemented a hardware instruction like CRC32C.B rd, rs1, rs2 in hardware. I can invoke this instruction through an intrinsic function, such as @llvm.riscv.crc32.b. I'd like to leverage the results from HashRecognize to achieve this optimization. Could you please give me some practical advice on how to proceed?

Oh, it's quite simple then. Create a target-hook to query if there's a valid lowering for llvm.riscv.crc32.b (for your custom hardware, yes), and modify optimizeCRCLoop with a query for this hook: if so, simply take Info.LHS, Info.LHSAux and Info.RHS, feed them into llvm.riscv.crc32.b, delete the existing loop, and skip the existing code? You don't need to call genSarwateTable.

@Guohua-W
Copy link

Hello, I have a question: Can HashRecognize recognize both byte-level and bit-level CRC computations? If I have designed dedicated hardware instructions for the RISC-V architecture, how can I leverage the results from HashRecognize to optimize the code by replacing software CRC computations with these hardware instructions?

Currently, HashRecognize recognizes bit-level CRC computations and LoopIdiom optimizes it using a byte-level table-lookup. I think the pending work to enable the use of hardware instructions is to introduce a generic llvm.clmul, but nobody is working on it at the moment. Kindly feel free to pick up the task, if you're so inclined.

Thank you for your response. You've done a fantastic job, and your answers have truly inspired me. I'm relatively new to LLVM, and my current goal is to recognize CRC loops and optimize them using my custom instructions. For example, I've already implemented a hardware instruction like CRC32C.B rd, rs1, rs2 in hardware. I can invoke this instruction through an intrinsic function, such as @llvm.riscv.crc32.b. I'd like to leverage the results from HashRecognize to achieve this optimization. Could you please give me some practical advice on how to proceed?

Oh, it's quite simple then. Create a target-hook to query if there's a valid lowering for llvm.riscv.crc32.b (for your custom hardware, yes), and modify optimizeCRCLoop with a query for this hook: if so, simply take Info.LHS, Info.LHSAux and Info.RHS, feed them into llvm.riscv.crc32.b, delete the existing loop, and skip the existing code? You don't need to call genSarwateTable.

Can HashRecognize.cpp recognize the following CRC loop? What does LHSAux mean? I compiled the code below, but found that Info.LHSAux is null.
uint32_t crc32(const uint8_t *data, size_t length) {
uint32_t crc = 0xFFFFFFFF;
for (size_t i = 0; i < length; i++) {
crc ^= data[i] << 24;
for (int j = 0; j < 8; j++) {
if (crc & 0x80000000) {
crc = (crc << 1) ^ CRC32_POLY;
} else {
crc <<= 1;
}
}
}
return crc ^ 0xFFFFFFFF;
}

@artagnon
Copy link
Contributor Author

artagnon commented Oct 11, 2025

I compiled the code below, but found that Info.LHSAux is null.

LHSAux is the optional checksum data, which is sometimes part of CRC algorithms: in your example, this is absent. I would recommend looking at https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/UnitTests/HashRecognize: in particular, see the difference between the "data" and "nodata" tests: all of them are optimized using HashRecognize, and "nodata" tests correspond to the case when LHSAux is null.

See also comments here:

// The LHS in a polynomial operation, or the initial variable of the
// computation, since all polynomial operations must have a constant RHS,
// which is the generating polynomial. It is the LHS of the polynomial
// division in the case of CRC. Since polynomial division is an XOR in
// GF(2^m), this variable must be XOR'ed with RHS in a loop to yield the
// ComputedValue.
Value *LHS;
// The generating polynomial, or the RHS of the polynomial division in the
// case of CRC.
APInt RHS;
// The final computed value. This is a remainder of a polynomial division in
// the case of CRC, which must be zero.
Value *ComputedValue;
// Set to true in the case of big-endian.
bool ByteOrderSwapped;
// An optional auxiliary checksum that augments the LHS. In the case of CRC,
// it is XOR'ed with the LHS, so that the computation's final remainder is
// zero.
Value *LHSAux;

const uint8_t *data

I would recommend passing a non-pointer data and bit-shifting: I'm not sure if the compiler flow would permit HashRecognize to optimize pointer data examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants