Skip to content

Conversation

@wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Oct 27, 2025

The IR pattern is compiled from OpenCL code:
__builtin_astype(x > (uchar2)(0) ? x : -x, uchar2);
where smax is created by foldSelectInstWithICmp + canonicalizeSPF.

smax could also come from direct elementwise max call:
int c = b > (int)(0) ? (int)(0) : -b;
int d = __builtin_elementwise_max(b, (int)(0));
*a = c | d;

https://alive2.llvm.org/ce/z/2-brvr
https://alive2.llvm.org/ce/z/Dowjzk
https://alive2.llvm.org/ce/z/kathwZ

The IR pattern is compiled from OpenCL code:
  __builtin_astype(x > (uchar2)(0) ? x : -x, uchar2);
where smax is created by foldSelectInstWithICmp + canonicalizeSPF.

smax could also come from direct elementwise max call:
  int c = b > (int)(0) ? (int)(0) : -b;
  int d = __builtin_elementwise_max(b, (int)(0));
  *a = c | d;
@wenju-he wenju-he requested a review from nikic as a code owner October 27, 2025 04:55
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Oct 27, 2025
@wenju-he wenju-he requested a review from dtcxzyw October 27, 2025 04:55
@llvmbot
Copy link
Member

llvmbot commented Oct 27, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-systemz

@llvm/pr-subscribers-llvm-transforms

Author: Wenju He (wenju-he)

Changes

The IR pattern is compiled from OpenCL code:
__builtin_astype(x > (uchar2)(0) ? x : -x, uchar2);
where smax is created by foldSelectInstWithICmp + canonicalizeSPF.

smax could also come from direct elementwise max call:
int c = b > (int)(0) ? (int)(0) : -b;
int d = __builtin_elementwise_max(b, (int)(0));
*a = c | d;


Full diff: https://github.com/llvm/llvm-project/pull/165200.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+18)
  • (modified) llvm/test/Transforms/InstCombine/or.ll (+28)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 3ddf182149e57..4e863ca2c6dfd 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -3997,6 +3997,20 @@ static Value *foldOrUnsignedUMulOverflowICmp(BinaryOperator &I,
   return nullptr;
 }
 
+// Fold select(X >s 0, 0, -X) | smax(X, 0) --> abs(X)
+static Value *FoldOrOfSelectSmaxToAbs(BinaryOperator &I,
+                                      InstCombiner::BuilderTy &Builder) {
+  CmpPredicate Pred;
+  Value *X;
+  if (match(&I, m_c_Or(m_Select(m_ICmp(Pred, m_Value(X), m_ZeroInt()),
+                                m_ZeroInt(), m_Sub(m_ZeroInt(), m_Deferred(X))),
+                       m_OneUse(m_Intrinsic<Intrinsic::smax>(m_Deferred(X),
+                                                             m_ZeroInt())))) &&
+      Pred == ICmpInst::ICMP_SGT)
+    return Builder.CreateBinaryIntrinsic(Intrinsic::abs, X, Builder.getFalse());
+  return nullptr;
+}
+
 // FIXME: We use commutative matchers (m_c_*) for some, but not all, matches
 // here. We should standardize that construct where it is needed or choose some
 // other way to ensure that commutated variants of patterns are not missed.
@@ -4545,6 +4559,10 @@ Instruction *InstCombinerImpl::visitOr(BinaryOperator &I) {
     if (Value *V = SimplifyAddWithRemainder(I))
       return replaceInstUsesWith(I, V);
 
+  // select(X >s 0, 0, -X) | smax(X, 0) -> abs(X)
+  if (Value *Res = FoldOrOfSelectSmaxToAbs(I, Builder))
+    return replaceInstUsesWith(I, Res);
+
   return nullptr;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/or.ll b/llvm/test/Transforms/InstCombine/or.ll
index 6b090e982af0a..bbc79e8c16a56 100644
--- a/llvm/test/Transforms/InstCombine/or.ll
+++ b/llvm/test/Transforms/InstCombine/or.ll
@@ -2113,3 +2113,31 @@ define <4 x i32> @or_zext_nneg_minus_constant_splat(<4 x i8> %a) {
   %or = or <4 x i32> %zext, splat (i32 -9)
   ret <4 x i32> %or
 }
+
+define i8 @or_positive_minus_non_positive_to_abs(i8 noundef %0){
+; CHECK-LABEL: @or_positive_minus_non_positive_to_abs(
+; CHECK-NEXT:    [[TMP2:%.*]] = call i8 @llvm.abs.i8(i8 [[TMP0:%.*]], i1 false)
+; CHECK-NEXT:    ret i8 [[TMP2]]
+;
+  %2 = icmp sgt i8 %0, zeroinitializer
+  %3 = sext i1 %2 to i8
+  %4 = sub i8 zeroinitializer, %0
+  %5 = xor i8 %3, -1
+  %6 = and i8 %4, %5
+  %7 = and i8 %0, %3
+  %8 = or i8 %6, %7
+  ret i8 %8
+}
+
+define <2 x i8> @or_select_smax_to_abs(<2 x i8> %0){
+; CHECK-LABEL: @or_select_smax_to_abs(
+; CHECK-NEXT:    [[TMP2:%.*]] = call <2 x i8> @llvm.abs.v2i8(<2 x i8> [[TMP0:%.*]], i1 false)
+; CHECK-NEXT:    ret <2 x i8> [[TMP2]]
+;
+  %2 = icmp sgt <2 x i8> %0, zeroinitializer
+  %3 = sub <2 x i8> zeroinitializer, %0
+  %4 = select <2 x i1> %2, <2 x i8> zeroinitializer, <2 x i8> %3
+  %5 = tail call <2 x i8> @llvm.smax.v2i8(<2 x i8> %0, <2 x i8> zeroinitializer)
+  %6 = or <2 x i8> %4, %5
+  ret <2 x i8> %6
+}

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add alive2 link

; CHECK-NEXT: [[TMP2:%.*]] = call i8 @llvm.abs.i8(i8 [[TMP0:%.*]], i1 false)
; CHECK-NEXT: ret i8 [[TMP2]]
;
%2 = icmp sgt i8 %0, zeroinitializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use named values in tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use named values in tests

done

%5 = tail call <2 x i8> @llvm.smax.v2i8(<2 x i8> %0, <2 x i8> zeroinitializer)
%6 = or <2 x i8> %4, %5
ret <2 x i8> %6
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing negative test for the multiple use case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing negative test for the multiple use case

done

@wenju-he
Copy link
Contributor Author

Can you add alive2 link

done. Updated commit message:
https://alive2.llvm.org/ce/z/2-brvr
https://alive2.llvm.org/ce/z/Dowjzk

@wenju-he wenju-he requested a review from arsenm October 27, 2025 06:34
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we know X is never INT_MIN, select(X >s 0, 0, -X) may be folded into smax(-X, 0): https://alive2.llvm.org/ce/z/wDiDh2
Currently we don't do this fold. Can you please also add a test with smax(-X, 0) | smax(X, 0) and leave it as a todo?

ret i8 %or
}

define <2 x i8> @or_select_smax_to_abs(<2 x i8> %a){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test with select(X <s 0, -X, 0) | smax(X, 0) --> abs(X).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test with select(X <s 0, -X, 0) | smax(X, 0) --> abs(X).

thanks, done in bd8b791

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@wenju-he
Copy link
Contributor Author

When we know X is never INT_MIN, select(X >s 0, 0, -X) may be folded into smax(-X, 0): https://alive2.llvm.org/ce/z/wDiDh2 Currently we don't do this fold. Can you please also add a test with smax(-X, 0) | smax(X, 0) and leave it as a todo?

thanks, added 2 TODO tests in 1e9a8ce

@wenju-he wenju-he requested a review from dtcxzyw October 28, 2025 02:38
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:SystemZ labels Oct 28, 2025

vsc = vec_abs(vsc);
// CHECK-ASM: vlcb
// CHECK-ASM: vlpb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SystemZ changes LGTM. In fact, this fixes a regression I hadn't even been aware of, which was introduced here: 1a60ae0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SystemZ changes LGTM. In fact, this fixes a regression I hadn't even been aware of, which was introduced here: 1a60ae0

thanks @uweigand for review

ret <2 x i8> %or
}

define <2 x i8> @or_lgt_select_smax_to_abs(<2 x i8> %a){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
define <2 x i8> @or_lgt_select_smax_to_abs(<2 x i8> %a){
define <2 x i8> @or_slt_select_smax_to_abs(<2 x i8> %a){

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
define <2 x i8> @or_lgt_select_smax_to_abs(<2 x i8> %a){
define <2 x i8> @or_slt_select_smax_to_abs(<2 x i8> %a){

done, thanks @dtcxzyw

@wenju-he wenju-he requested a review from dtcxzyw October 30, 2025 23:51
InstCombiner::BuilderTy &Builder) {
Value *X;
Value *Sel;
if (match(&I, m_c_Or(m_Value(Sel), m_OneUse(m_Intrinsic<Intrinsic::smax>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use m_SMax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use m_SMax

done

@wenju-he wenju-he requested a review from arsenm October 31, 2025 07:34
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wenju-he wenju-he merged commit 79bf8c0 into llvm:main Nov 2, 2025
13 of 14 checks passed
@wenju-he wenju-he deleted the instcombine-or-select-smax-to-abs branch November 2, 2025 23:38
DEBADRIBASAK pushed a commit to DEBADRIBASAK/llvm-project that referenced this pull request Nov 3, 2025
…#165200)

The IR pattern is compiled from OpenCL code:
  __builtin_astype(x > (uchar2)(0) ? x : -x, uchar2);
where smax is created by foldSelectInstWithICmp + canonicalizeSPF.

smax could also come from direct elementwise max call:
  int c = b > (int)(0) ? (int)(0) : -b;
  int d = __builtin_elementwise_max(b, (int)(0));
  *a = c | d;

https://alive2.llvm.org/ce/z/2-brvr
https://alive2.llvm.org/ce/z/Dowjzk
https://alive2.llvm.org/ce/z/kathwZ

---------

Co-authored-by: Yingwei Zheng <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:SystemZ clang Clang issues not falling into any other category llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants