Skip to content

Conversation

@dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented May 22, 2025

This patch was originally part of #139861. It generalizes ignoreSignBitOfZero/NaN to handle more instructions/intrinsics.

BTW, I find it mitigates performance regressions caused by #141010 (IR diff https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2365/files). We don't need to propagate FMF from fcmp into select, since we can infer demanded properties from the user of select.

@dtcxzyw dtcxzyw requested review from arsenm and jcranmer-intel May 22, 2025 07:40
@dtcxzyw dtcxzyw requested a review from nikic as a code owner May 22, 2025 07:41
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels May 22, 2025
@llvmbot
Copy link
Member

llvmbot commented May 22, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch was originally part of #139861. It generalizes ignoreSignBitOfZero/NaN to handle more instructions/intrinsics.

BTW, I find it mitigates performance regressions caused by #141010 (IR diff https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2365/files). We don't need to propagate FMF from fcmp into select, since we can infer demanded properties from the user of select.


Full diff: https://github.com/llvm/llvm-project/pull/141015.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+91-13)
  • (modified) llvm/test/Transforms/InstCombine/fabs.ll (+105-9)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 2ef233bc25d72..13cbf7b215557 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -2783,15 +2783,41 @@ static bool ignoreSignBitOfZero(Instruction &I) {
   if (!I.hasOneUse())
     return false;
   Instruction *User = I.user_back();
+  if (auto *FPOp = dyn_cast<FPMathOperator>(User)) {
+    if (FPOp->hasNoSignedZeros())
+      return true;
+  }
 
-  // fcmp treats both positive and negative zero as equal.
-  if (User->getOpcode() == Instruction::FCmp)
+  switch (User->getOpcode()) {
+  case Instruction::FPToSI:
+  case Instruction::FPToUI:
     return true;
-
-  if (auto *FPOp = dyn_cast<FPMathOperator>(User))
-    return FPOp->hasNoSignedZeros();
-
-  return false;
+  case Instruction::FCmp:
+    // fcmp treats both positive and negative zero as equal.
+    return true;
+  case Instruction::Call:
+    if (auto *II = dyn_cast<IntrinsicInst>(User)) {
+      switch (II->getIntrinsicID()) {
+      case Intrinsic::fabs:
+        return true;
+      case Intrinsic::copysign:
+        return II->getArgOperand(0) == &I;
+      case Intrinsic::is_fpclass:
+      case Intrinsic::vp_is_fpclass: {
+        auto Test =
+            static_cast<FPClassTest>(
+                cast<ConstantInt>(II->getArgOperand(1))->getZExtValue()) &
+            FPClassTest::fcZero;
+        return Test == FPClassTest::fcZero || Test == FPClassTest::fcNone;
+      }
+      default:
+        return false;
+      }
+    }
+    return false;
+  default:
+    return false;
+  }
 }
 
 /// Return true if the sign bit of result can be ignored when the result is NaN.
@@ -2803,15 +2829,67 @@ static bool ignoreSignBitOfNaN(Instruction &I) {
   if (!I.hasOneUse())
     return false;
   Instruction *User = I.user_back();
+  if (auto *FPOp = dyn_cast<FPMathOperator>(User)) {
+    if (FPOp->hasNoNaNs())
+      return true;
+  }
 
-  // fcmp ignores the sign bit of NaN.
-  if (User->getOpcode() == Instruction::FCmp)
+  switch (User->getOpcode()) {
+  case Instruction::FPToSI:
+  case Instruction::FPToUI:
     return true;
+  // Proper FP math operations ignore the sign bit of NaN.
+  case Instruction::FAdd:
+  case Instruction::FSub:
+  case Instruction::FMul:
+  case Instruction::FDiv:
+  case Instruction::FRem:
+  case Instruction::FPTrunc:
+  case Instruction::FPExt:
+  case Instruction::FCmp:
+    return true;
+  // Bitwise FP operations should preserve the sign bit of NaN.
+  case Instruction::FNeg:
+  case Instruction::Select:
+  case Instruction::PHI:
+    return false;
+  case Instruction::Call:
+  case Instruction::Invoke: {
+    if (auto *II = dyn_cast<IntrinsicInst>(User)) {
+      switch (II->getIntrinsicID()) {
+      case Intrinsic::fabs:
+        return true;
+      case Intrinsic::copysign:
+        return II->getArgOperand(0) == &I;
+      // Other proper FP math intrinsics ignore the sign bit of NaN.
+      case Intrinsic::maxnum:
+      case Intrinsic::minnum:
+      case Intrinsic::maximum:
+      case Intrinsic::minimum:
+      case Intrinsic::maximumnum:
+      case Intrinsic::minimumnum:
+      case Intrinsic::canonicalize:
+      case Intrinsic::fma:
+      case Intrinsic::fmuladd:
+      case Intrinsic::sqrt:
+      case Intrinsic::pow:
+      case Intrinsic::powi:
+      case Intrinsic::fptoui_sat:
+      case Intrinsic::fptosi_sat:
+      case Intrinsic::is_fpclass:
+        return true;
+      default:
+        return false;
+      }
+    }
 
-  if (auto *FPOp = dyn_cast<FPMathOperator>(User))
-    return FPOp->hasNoNaNs();
-
-  return false;
+    FPClassTest NoFPClass = cast<CallBase>(User)->getParamNoFPClass(
+        I.uses().begin()->getOperandNo());
+    return NoFPClass & FPClassTest::fcNan;
+  }
+  default:
+    return false;
+  }
 }
 
 // Canonicalize select with fcmp to fabs(). -0.0 makes this tricky. We need
diff --git a/llvm/test/Transforms/InstCombine/fabs.ll b/llvm/test/Transforms/InstCombine/fabs.ll
index ab4376bf78a67..b889f6c2c028c 100644
--- a/llvm/test/Transforms/InstCombine/fabs.ll
+++ b/llvm/test/Transforms/InstCombine/fabs.ll
@@ -1329,12 +1329,9 @@ define float @test_fabs_fsub_used_by_fpop_nnan(float %x, float %y) {
   ret float %add
 }
 
-; TODO: fadd ignores the sign bit of NaN.
 define float @test_fabs_used_by_fpop_nsz(float %x, float %y) {
 ; CHECK-LABEL: @test_fabs_used_by_fpop_nsz(
-; CHECK-NEXT:    [[CMP:%.*]] = fcmp oge float [[X:%.*]], 0.000000e+00
-; CHECK-NEXT:    [[NEG:%.*]] = fneg float [[X]]
-; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[CMP]], float [[X]], float [[NEG]]
+; CHECK-NEXT:    [[SEL:%.*]] = call float @llvm.fabs.f32(float [[X:%.*]])
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd nsz float [[SEL]], [[Y:%.*]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -1345,13 +1342,9 @@ define float @test_fabs_used_by_fpop_nsz(float %x, float %y) {
   ret float %add
 }
 
-; TODO: copysign ignores the sign bit of NaN magnitude.
 define float @test_fabs_used_by_fcopysign_mag(float %x, float %y) {
 ; CHECK-LABEL: @test_fabs_used_by_fcopysign_mag(
-; CHECK-NEXT:    [[CMP:%.*]] = fcmp oge float [[X1:%.*]], 0.000000e+00
-; CHECK-NEXT:    [[NEG:%.*]] = fneg float [[X1]]
-; CHECK-NEXT:    [[X:%.*]] = select i1 [[CMP]], float [[X1]], float [[NEG]]
-; CHECK-NEXT:    [[COPYSIGN:%.*]] = call float @llvm.copysign.f32(float [[X]], float [[Y:%.*]])
+; CHECK-NEXT:    [[COPYSIGN:%.*]] = call float @llvm.copysign.f32(float [[X:%.*]], float [[Y:%.*]])
 ; CHECK-NEXT:    ret float [[COPYSIGN]]
 ;
   %cmp = fcmp oge float %x, 0.000000e+00
@@ -1361,6 +1354,94 @@ define float @test_fabs_used_by_fcopysign_mag(float %x, float %y) {
   ret float %copysign
 }
 
+define float @test_fabs_nsz_used_by_canonicalize(float %x) {
+; CHECK-LABEL: @test_fabs_nsz_used_by_canonicalize(
+; CHECK-NEXT:    [[SEL:%.*]] = call nsz float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    [[CANON:%.*]] = call float @llvm.canonicalize.f32(float [[SEL]])
+; CHECK-NEXT:    ret float [[CANON]]
+;
+  %cmp = fcmp oge float %x, 0.000000e+00
+  %neg = fneg float %x
+  %sel = select nsz i1 %cmp, float %x, float %neg
+  %canon = call float @llvm.canonicalize.f32(float %sel)
+  ret float %canon
+}
+
+define void @test_fabs_used_by_nofpclass_nan(float %x) {
+; CHECK-LABEL: @test_fabs_used_by_nofpclass_nan(
+; CHECK-NEXT:    [[SEL:%.*]] = call nsz float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    call void @use(float nofpclass(nan) [[SEL]])
+; CHECK-NEXT:    ret void
+;
+  %cmp = fcmp oge float %x, 0.000000e+00
+  %neg = fneg float %x
+  %sel = select nsz i1 %cmp, float %x, float %neg
+  call void @use(float nofpclass(nan) %sel)
+  ret void
+}
+
+define i32 @test_fabs_used_fptosi(float %x) {
+; CHECK-LABEL: @test_fabs_used_fptosi(
+; CHECK-NEXT:    [[SEL:%.*]] = call float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    [[FPTOSI:%.*]] = fptosi float [[SEL]] to i32
+; CHECK-NEXT:    ret i32 [[FPTOSI]]
+;
+  %cmp = fcmp oge float %x, 0.000000e+00
+  %neg = fneg float %x
+  %sel = select i1 %cmp, float %x, float %neg
+  %fptosi = fptosi float %sel to i32
+  ret i32 %fptosi
+}
+
+define i32 @test_fabs_used_fptoui(float %x) {
+; CHECK-LABEL: @test_fabs_used_fptoui(
+; CHECK-NEXT:    [[SEL:%.*]] = call float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    [[FPTOSI:%.*]] = fptoui float [[SEL]] to i32
+; CHECK-NEXT:    ret i32 [[FPTOSI]]
+;
+  %cmp = fcmp oge float %x, 0.000000e+00
+  %neg = fneg float %x
+  %sel = select i1 %cmp, float %x, float %neg
+  %fptosi = fptoui float %sel to i32
+  ret i32 %fptosi
+}
+
+define float @test_fabs_nsz_used_by_maxnum(float %x, float %y) {
+; CHECK-LABEL: @test_fabs_nsz_used_by_maxnum(
+; CHECK-NEXT:    [[SEL:%.*]] = call nsz float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.maxnum.f32(float [[Y:%.*]], float [[SEL]])
+; CHECK-NEXT:    ret float [[MAX]]
+;
+  %cmp = fcmp oge float %x, 0.000000e+00
+  %neg = fneg float %x
+  %sel = select nsz i1 %cmp, float %x, float %neg
+  %max = call float @llvm.maxnum.f32(float %y, float %sel)
+  ret float %max
+}
+
+define i1 @test_fabs_used_is_fpclass_pnorm_or_nan(float %x) {
+; CHECK-LABEL: @test_fabs_used_is_fpclass_pnorm_or_nan(
+; CHECK-NEXT:    [[IS_FPCLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[X:%.*]], i32 267)
+; CHECK-NEXT:    ret i1 [[IS_FPCLASS]]
+;
+  %cmp = fcmp oge float %x, 0.000000e+00
+  %neg = fneg float %x
+  %sel = select i1 %cmp, float %x, float %neg
+  %is_fpclass = call i1 @llvm.is.fpclass.f32(float %sel, i32 259)
+  ret i1 %is_fpclass
+}
+
+define i1 @test_fabs_used_is_fpclass_zero_or_pinf(float %x) {
+; CHECK-LABEL: @test_fabs_used_is_fpclass_zero_or_pinf(
+; CHECK-NEXT:    [[IS_FPCLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[X:%.*]], i32 612)
+; CHECK-NEXT:    ret i1 [[IS_FPCLASS]]
+;
+  %cmp = fcmp oge float %x, 0.000000e+00
+  %neg = fneg float %x
+  %sel = select i1 %cmp, float %x, float %neg
+  %is_fpclass = call i1 @llvm.is.fpclass.f32(float %sel, i32 608)
+  ret i1 %is_fpclass
+}
 
 ; Negative tests
 
@@ -1455,3 +1536,18 @@ define float @test_fabs_used_by_select(float %x, i1 %cond) {
   %sel2 = select i1 %cond, float %sel, float 0.000000e+00
   ret float %sel2
 }
+
+define i1 @test_fabs_used_is_fpclass_pzero(float %x) {
+; CHECK-LABEL: @test_fabs_used_is_fpclass_pzero(
+; CHECK-NEXT:    [[CMP:%.*]] = fcmp oge float [[X:%.*]], 0.000000e+00
+; CHECK-NEXT:    [[NEG:%.*]] = fneg float [[X]]
+; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[CMP]], float [[X]], float [[NEG]]
+; CHECK-NEXT:    [[IS_FPCLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[SEL]], i32 64)
+; CHECK-NEXT:    ret i1 [[IS_FPCLASS]]
+;
+  %cmp = fcmp oge float %x, 0.000000e+00
+  %neg = fneg float %x
+  %sel = select i1 %cmp, float %x, float %neg
+  %is_fpclass = call i1 @llvm.is.fpclass.f32(float %sel, i32 64)
+  ret i1 %is_fpclass
+}

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label May 22, 2025
@arsenm arsenm added the floating-point Floating-point math label May 22, 2025
@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 23, 2025

Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=5a3776af521b8ddc14a19d1954af64e2960d2397&to=bf96e741b99107ce14ef6e32751f73d98f8eb821&stat=instructions%3Au

Note that stage2-O3 compile-time overhead is high (+0.07%). It is mainly caused by moving these two helpers into ValueTracking.

@dtcxzyw
Copy link
Member Author

dtcxzyw commented May 27, 2025

Any more comments?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compile-time impact looks ok.

case Instruction::FPToSI:
case Instruction::FPToUI:
return true;
// Proper FP math operations ignore the sign bit of NaN.
Copy link
Contributor

@nikic nikic May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference: https://llvm.org/docs/LangRef.html#behavior-of-floating-point-nan-values

the result has a non-deterministic sign

@dtcxzyw dtcxzyw changed the title [InstCombine] Generalize ignoreSignBitOfZero/NaN to handle more cases [ValueTracking][InstCombine] Generalize ignoreSignBitOfZero/NaN to handle more cases May 27, 2025
@dtcxzyw dtcxzyw merged commit 6c86b7d into llvm:main May 28, 2025
11 checks passed
@dtcxzyw dtcxzyw deleted the perf/select-fabs-user branch May 28, 2025 11:17
dtcxzyw added a commit that referenced this pull request May 31, 2025
Previously,
3d6b539
propagates FMF from fcmp to avoid performance regressions. With the help
of #139861,
#141015, and
#141914, we can still convert
SPF into fabs/minnum/maxnum intrinsics even if some flags are missing.
This patch propagates FMF from select to address the long-standing
issue.

Closes #140994.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 31, 2025
…141010)

Previously,
llvm/llvm-project@3d6b539
propagates FMF from fcmp to avoid performance regressions. With the help
of llvm/llvm-project#139861,
llvm/llvm-project#141015, and
llvm/llvm-project#141914, we can still convert
SPF into fabs/minnum/maxnum intrinsics even if some flags are missing.
This patch propagates FMF from select to address the long-standing
issue.

Closes llvm/llvm-project#140994.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

floating-point Floating-point math llvm:analysis Includes value tracking, cost tables and constant folding llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants