-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[InstCombine] Fold select(X >s 0, 0, -X) | smax(X, 0) to abs(X) #165200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] Fold select(X >s 0, 0, -X) | smax(X, 0) to abs(X) #165200
Conversation
The IR pattern is compiled from OpenCL code: __builtin_astype(x > (uchar2)(0) ? x : -x, uchar2); where smax is created by foldSelectInstWithICmp + canonicalizeSPF. smax could also come from direct elementwise max call: int c = b > (int)(0) ? (int)(0) : -b; int d = __builtin_elementwise_max(b, (int)(0)); *a = c | d;
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-llvm-transforms Author: Wenju He (wenju-he) ChangesThe IR pattern is compiled from OpenCL code: smax could also come from direct elementwise max call: Full diff: https://github.com/llvm/llvm-project/pull/165200.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 3ddf182149e57..4e863ca2c6dfd 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -3997,6 +3997,20 @@ static Value *foldOrUnsignedUMulOverflowICmp(BinaryOperator &I,
return nullptr;
}
+// Fold select(X >s 0, 0, -X) | smax(X, 0) --> abs(X)
+static Value *FoldOrOfSelectSmaxToAbs(BinaryOperator &I,
+ InstCombiner::BuilderTy &Builder) {
+ CmpPredicate Pred;
+ Value *X;
+ if (match(&I, m_c_Or(m_Select(m_ICmp(Pred, m_Value(X), m_ZeroInt()),
+ m_ZeroInt(), m_Sub(m_ZeroInt(), m_Deferred(X))),
+ m_OneUse(m_Intrinsic<Intrinsic::smax>(m_Deferred(X),
+ m_ZeroInt())))) &&
+ Pred == ICmpInst::ICMP_SGT)
+ return Builder.CreateBinaryIntrinsic(Intrinsic::abs, X, Builder.getFalse());
+ return nullptr;
+}
+
// FIXME: We use commutative matchers (m_c_*) for some, but not all, matches
// here. We should standardize that construct where it is needed or choose some
// other way to ensure that commutated variants of patterns are not missed.
@@ -4545,6 +4559,10 @@ Instruction *InstCombinerImpl::visitOr(BinaryOperator &I) {
if (Value *V = SimplifyAddWithRemainder(I))
return replaceInstUsesWith(I, V);
+ // select(X >s 0, 0, -X) | smax(X, 0) -> abs(X)
+ if (Value *Res = FoldOrOfSelectSmaxToAbs(I, Builder))
+ return replaceInstUsesWith(I, Res);
+
return nullptr;
}
diff --git a/llvm/test/Transforms/InstCombine/or.ll b/llvm/test/Transforms/InstCombine/or.ll
index 6b090e982af0a..bbc79e8c16a56 100644
--- a/llvm/test/Transforms/InstCombine/or.ll
+++ b/llvm/test/Transforms/InstCombine/or.ll
@@ -2113,3 +2113,31 @@ define <4 x i32> @or_zext_nneg_minus_constant_splat(<4 x i8> %a) {
%or = or <4 x i32> %zext, splat (i32 -9)
ret <4 x i32> %or
}
+
+define i8 @or_positive_minus_non_positive_to_abs(i8 noundef %0){
+; CHECK-LABEL: @or_positive_minus_non_positive_to_abs(
+; CHECK-NEXT: [[TMP2:%.*]] = call i8 @llvm.abs.i8(i8 [[TMP0:%.*]], i1 false)
+; CHECK-NEXT: ret i8 [[TMP2]]
+;
+ %2 = icmp sgt i8 %0, zeroinitializer
+ %3 = sext i1 %2 to i8
+ %4 = sub i8 zeroinitializer, %0
+ %5 = xor i8 %3, -1
+ %6 = and i8 %4, %5
+ %7 = and i8 %0, %3
+ %8 = or i8 %6, %7
+ ret i8 %8
+}
+
+define <2 x i8> @or_select_smax_to_abs(<2 x i8> %0){
+; CHECK-LABEL: @or_select_smax_to_abs(
+; CHECK-NEXT: [[TMP2:%.*]] = call <2 x i8> @llvm.abs.v2i8(<2 x i8> [[TMP0:%.*]], i1 false)
+; CHECK-NEXT: ret <2 x i8> [[TMP2]]
+;
+ %2 = icmp sgt <2 x i8> %0, zeroinitializer
+ %3 = sub <2 x i8> zeroinitializer, %0
+ %4 = select <2 x i1> %2, <2 x i8> zeroinitializer, <2 x i8> %3
+ %5 = tail call <2 x i8> @llvm.smax.v2i8(<2 x i8> %0, <2 x i8> zeroinitializer)
+ %6 = or <2 x i8> %4, %5
+ ret <2 x i8> %6
+}
|
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add alive2 link
| ; CHECK-NEXT: [[TMP2:%.*]] = call i8 @llvm.abs.i8(i8 [[TMP0:%.*]], i1 false) | ||
| ; CHECK-NEXT: ret i8 [[TMP2]] | ||
| ; | ||
| %2 = icmp sgt i8 %0, zeroinitializer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use named values in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use named values in tests
done
| %5 = tail call <2 x i8> @llvm.smax.v2i8(<2 x i8> %0, <2 x i8> zeroinitializer) | ||
| %6 = or <2 x i8> %4, %5 | ||
| ret <2 x i8> %6 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing negative test for the multiple use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing negative test for the multiple use case
done
done. Updated commit message: |
dtcxzyw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we know X is never INT_MIN, select(X >s 0, 0, -X) may be folded into smax(-X, 0): https://alive2.llvm.org/ce/z/wDiDh2
Currently we don't do this fold. Can you please also add a test with smax(-X, 0) | smax(X, 0) and leave it as a todo?
| ret i8 %or | ||
| } | ||
|
|
||
| define <2 x i8> @or_select_smax_to_abs(<2 x i8> %a){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test with select(X <s 0, -X, 0) | smax(X, 0) --> abs(X).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test with
select(X <s 0, -X, 0) | smax(X, 0) --> abs(X).
thanks, done in bd8b791
Co-authored-by: Yingwei Zheng <[email protected]>
Co-authored-by: Yingwei Zheng <[email protected]>
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
thanks, added 2 TODO tests in 1e9a8ce |
|
|
||
| vsc = vec_abs(vsc); | ||
| // CHECK-ASM: vlcb | ||
| // CHECK-ASM: vlpb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SystemZ changes LGTM. In fact, this fixes a regression I hadn't even been aware of, which was introduced here: 1a60ae0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ret <2 x i8> %or | ||
| } | ||
|
|
||
| define <2 x i8> @or_lgt_select_smax_to_abs(<2 x i8> %a){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| define <2 x i8> @or_lgt_select_smax_to_abs(<2 x i8> %a){ | |
| define <2 x i8> @or_slt_select_smax_to_abs(<2 x i8> %a){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change
define <2 x i8> @or_lgt_select_smax_to_abs(<2 x i8> %a){
define <2 x i8> @or_slt_select_smax_to_abs(<2 x i8> %a){
done, thanks @dtcxzyw
| InstCombiner::BuilderTy &Builder) { | ||
| Value *X; | ||
| Value *Sel; | ||
| if (match(&I, m_c_Or(m_Value(Sel), m_OneUse(m_Intrinsic<Intrinsic::smax>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use m_SMax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use m_SMax
done
Co-authored-by: Matt Arsenault <[email protected]>
dtcxzyw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…#165200) The IR pattern is compiled from OpenCL code: __builtin_astype(x > (uchar2)(0) ? x : -x, uchar2); where smax is created by foldSelectInstWithICmp + canonicalizeSPF. smax could also come from direct elementwise max call: int c = b > (int)(0) ? (int)(0) : -b; int d = __builtin_elementwise_max(b, (int)(0)); *a = c | d; https://alive2.llvm.org/ce/z/2-brvr https://alive2.llvm.org/ce/z/Dowjzk https://alive2.llvm.org/ce/z/kathwZ --------- Co-authored-by: Yingwei Zheng <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>
The IR pattern is compiled from OpenCL code:
__builtin_astype(x > (uchar2)(0) ? x : -x, uchar2);
where smax is created by foldSelectInstWithICmp + canonicalizeSPF.
smax could also come from direct elementwise max call:
int c = b > (int)(0) ? (int)(0) : -b;
int d = __builtin_elementwise_max(b, (int)(0));
*a = c | d;
https://alive2.llvm.org/ce/z/2-brvr
https://alive2.llvm.org/ce/z/Dowjzk
https://alive2.llvm.org/ce/z/kathwZ