-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), mask) #167101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-transforms Author: V S Susi Krishna (Susikrishna) ChangesThis patch adds a new InstCombine transformation that simplifies the pattern: This canonicalization removes redundant trunc/zext pairs surrounding a negate This change is motivated by issue #165306: Implementation details
Full diff: https://github.com/llvm/llvm-project/pull/167101.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
index 614c6ebd63be6..ebe1b747e6be4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
@@ -1366,6 +1366,22 @@ Instruction *InstCombinerImpl::visitZExt(ZExtInst &Zext) {
}
}
+ {
+ Value *TruncSrc = nullptr;
+ if (match(&Zext, m_ZExt(m_Sub(m_Zero(), m_Trunc(m_Value(TruncSrc)))))) {
+ IRBuilder<> Builder(&Zext);
+ Type *Ty = TruncSrc->getType();
+ unsigned BitWidth = Ty->getScalarSizeInBits();
+ unsigned MaskVal = BitWidth - 1;
+
+ Value *Zero = ConstantInt::get(Ty, 0);
+ Value *Neg = Builder.CreateSub(Zero, TruncSrc);
+ Value *Mask = ConstantInt::get(Ty, MaskVal);
+ Value *Masked = Builder.CreateAnd(Neg, Mask);
+ return replaceInstUsesWith(Zext, Masked);
+ }
+ }
+
return nullptr;
}
diff --git a/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll b/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll
new file mode 100644
index 0000000000000..31c7ba4a26796
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll
@@ -0,0 +1,55 @@
+; RUN: opt -passes=instcombine -S %s | FileCheck %s
+
+; ================================================================
+; Test: Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), (bitwidth-1))
+; Purpose: Check that InstCombine detects and simplifies the pattern
+; seen in rotate idioms, enabling backend rotate lowering.
+; ================================================================
+
+; === Scalar Case (i64) =========================================
+define i64 @neg_trunc_zext(i64 %a) {
+; CHECK-LABEL: @neg_trunc_zext(
+; CHECK-NEXT: %[[NEG:[0-9]+]] = sub i64 0, %a
+; CHECK-NEXT: %[[MASKED:[0-9A-Za-z_]+]] = and i64 %[[NEG]], 63
+; CHECK-NEXT: ret i64 %[[MASKED]]
+ %t = trunc i64 %a to i6
+ %n = sub i6 0, %t
+ %z = zext i6 %n to i64
+ ret i64 %z
+}
+
+; === Vector Case 1: <2 x i64> ==================================
+define <2 x i64> @foo(<2 x i64> %x, <2 x i64> %n) {
+; CHECK-LABEL: @foo(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <2 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <2 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <2 x i64> %[[MASK]]
+ %t = trunc <2 x i64> %n to <2 x i6>
+ %neg = sub <2 x i6> zeroinitializer, %t
+ %z = zext <2 x i6> %neg to <2 x i64>
+ ret <2 x i64> %z
+}
+
+; === Vector Case 2: <4 x i64> ==================================
+define <4 x i64> @bar(<4 x i64> %x, <4 x i64> %n) {
+; CHECK-LABEL: @bar(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <4 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <4 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <4 x i64> %[[MASK]]
+ %t = trunc <4 x i64> %n to <4 x i6>
+ %neg = sub <4 x i6> zeroinitializer, %t
+ %z = zext <4 x i6> %neg to <4 x i64>
+ ret <4 x i64> %z
+}
+
+; === Vector Case 3: <8 x i64> ==================================
+define <8 x i64> @baz(<8 x i64> %x, <8 x i64> %n) {
+; CHECK-LABEL: @baz(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <8 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <8 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <8 x i64> %[[MASK]]
+ %t = trunc <8 x i64> %n to <8 x i6>
+ %neg = sub <8 x i6> zeroinitializer, %t
+ %z = zext <8 x i6> %neg to <8 x i64>
+ ret <8 x i64> %z
+}
|
dtcxzyw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect this pattern to be handled in CanEvaluateZExtd.
Hi @dtcxzyw, I'm still learning my way around InstCombine. I tried to trace the logic, and it looks like canEvaluateZExtd is called from inside an if statement that first checks shouldChangeType(SrcTy, DestTy) In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this, so the canEvaluateZExtd function is never actually called. I'm guessing this is because the cost model doesn't recommend promoting from a small type like i6 all the way to i64? Is that the correct approach for this kind of special pattern, or is there a better way to do this that I'm not seeing? |
It is a bit strange to me. What is the datalayout you use? |
|
IIRC i6 was used just to stop alive2 from timing out |
@dtcxzyw, Sorry, I'm not sure what you mean by datalayout. Could you clarify what you're looking for? |
I added the datalayout to the motivating case. It got folded by InstCombine without your patch: So looks like it is not a real issue? |
|
@dtcxzyw This is the original test case, I may have oversimplified the missing transform: define <2 x i64> @src(<2 x i64> %0, <2 x i64> %1) {
Entry:
%2 = trunc <2 x i64> %1 to <2 x i6>
%3 = sub <2 x i6> zeroinitializer, %2
%4 = zext <2 x i6> %3 to <2 x i64>
%5 = shl <2 x i64> %0, %4
%6 = and <2 x i64> %1, splat (i64 63)
%7 = lshr <2 x i64> %0, %6
%8 = or <2 x i64> %5, %7
ret <2 x i64> %8
} |
|
@dtcxzyw You were right about the scalar case, it worked when I kept the datalayout. I think I found the reason for the difference. I was looking at the shouldChangeType function, and the Type* version has this check at the very top: This check returns false for all vector types, so the canEvaluateZExtd path is never reached for them (but it is reached for the scalar i6 type). |
| } | ||
|
|
||
| ; === Vector Case 1: <2 x i64> ================================== | ||
| define <2 x i64> @foo(<2 x i64> %x, <2 x i64> %n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) better naming - neg_trunc_zext_v2i64?
| } | ||
|
|
||
| ; === Vector Case 2: <4 x i64> ================================== | ||
| define <4 x i64> @bar(<4 x i64> %x, <4 x i64> %n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if these give us any additional coverage vs the v2i64 case?
InstCombine is conservative about changing vector types due to the lack of cost modeling. Perhaps we can do some simple fold in VectorCombine instead. |
This patch adds a new InstCombine transformation that simplifies the pattern:
This canonicalization removes redundant trunc/zext pairs surrounding a negate
operation and replaces them with a masked negate of the original operand.
The transform helps expose rotate idioms in vector code, enabling targets such
as X86 (AVX2/AVX-512) to generate more efficient vpror/vpsllvq/vpsrlvq
instructions.
Fixes #165306
[AVX-512] Look for vector bit rotates on vectors larger than 16 bytes
Implementation details
InstCombineCasts.cpp.rotate-trunc-zext.llcovering: