Skip to content

Conversation

@guy-david
Copy link
Contributor

[DAGCombiner] Relax nsz constraint for more FP optimizations

Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of the combined value are either guranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.).

The optimizations affected:

  • fadd x, -0.0 -> x
  • fsub x, 0.0 -> x
  • fsub -0.0, x -> fneg x
  • fdiv x, sqrt(x) -> sqrt(x)
  • frem lowering with power-of-2 divisors

Some floating-point optimization don't trigger because they can produce
incorrect results around signed zeros, and rely on the existence of the
nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of
the combined value are either guranteed to overwrite the sign-bit or
simply ignore it (comparisons, etc.).

The optimizations affected:
- fadd x, -0.0 -> x
- fsub x, 0.0 -> x
- fsub -0.0, x -> fneg x
- fdiv x, sqrt(x) -> sqrt(x)
- frem lowering with power-of-2 divisors
@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2025

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-aarch64

Author: Guy David (guy-david)

Changes

[DAGCombiner] Relax nsz constraint for more FP optimizations

Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of the combined value are either guranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.).

The optimizations affected:

  • fadd x, -0.0 -> x
  • fsub x, 0.0 -> x
  • fsub -0.0, x -> fneg x
  • fdiv x, sqrt(x) -> sqrt(x)
  • frem lowering with power-of-2 divisors

Full diff: https://github.com/llvm/llvm-project/pull/165011.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+11-5)
  • (added) llvm/test/CodeGen/AArch64/nsz-bypass.ll (+72)
  • (modified) llvm/test/CodeGen/AMDGPU/swdev380865.ll (+2-3)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 73aed33fe0838..f2b4e74cc65c7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17781,7 +17781,8 @@ SDValue DAGCombiner::visitFADD(SDNode *N) {
   // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
   ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
   if (N1C && N1C->isZero())
-    if (N1C->isNegative() || Flags.hasNoSignedZeros())
+    if (N1C->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0)))
       return N0;
 
   if (SDValue NewSel = foldBinOpIntoSelect(N))
@@ -17993,7 +17994,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub A, 0) -> A
   if (N1CFP && N1CFP->isZero()) {
-    if (!N1CFP->isNegative() || Flags.hasNoSignedZeros()) {
+    if (!N1CFP->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
       return N0;
     }
   }
@@ -18006,7 +18008,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub -0.0, N1) -> -N1
   if (N0CFP && N0CFP->isZero()) {
-    if (N0CFP->isNegative() || Flags.hasNoSignedZeros()) {
+    if (N0CFP->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
       // We cannot replace an FSUB(+-0.0,X) with FNEG(X) when denormals are
       // flushed to zero, unless all users treat denorms as zero (DAZ).
       // FIXME: This transform will change the sign of a NaN and the behavior
@@ -18654,7 +18657,9 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
   }
 
   // Fold X/Sqrt(X) -> Sqrt(X)
-  if (Flags.hasNoSignedZeros() && Flags.hasAllowReassociation())
+  if ((Flags.hasNoSignedZeros() ||
+       DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) &&
+      Flags.hasAllowReassociation())
     if (N1.getOpcode() == ISD::FSQRT && N0 == N1.getOperand(0))
       return N1;
 
@@ -18706,7 +18711,8 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
       TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
       DAG.isKnownToBeAPowerOfTwoFP(N1)) {
     bool NeedsCopySign =
-        !Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0);
+        !Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0) &&
+        !DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
     SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
     SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
     SDValue MLA;
diff --git a/llvm/test/CodeGen/AArch64/nsz-bypass.ll b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
new file mode 100644
index 0000000000000..3b17e410ac380
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
@@ -0,0 +1,72 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=aarch64 | FileCheck %s
+
+; Test that nsz constraint can be bypassed when all uses are sign-insensitive.
+
+define i1 @test_fadd_neg_zero_fcmp(float %x) {
+; CHECK-LABEL: test_fadd_neg_zero_fcmp:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s1, #1.00000000
+; CHECK-NEXT:    fcmp s0, s1
+; CHECK-NEXT:    cset w0, eq
+; CHECK-NEXT:    ret
+  %add = fadd float %x, -0.0
+  %cmp = fcmp oeq float %add, 1.0
+  ret i1 %cmp
+}
+
+define float @test_fsub_zero_fabs(float %x) {
+; CHECK-LABEL: test_fsub_zero_fabs:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fabs s0, s0
+; CHECK-NEXT:    ret
+  %sub = fsub float %x, 0.0
+  %abs = call float @llvm.fabs.f32(float %sub)
+  ret float %abs
+}
+
+define float @test_fsub_neg_zero_copysign(float %x, float %y) {
+; CHECK-LABEL: test_fsub_neg_zero_copysign:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mvni v2.4s, #128, lsl #24
+; CHECK-NEXT:    // kill: def $s0 killed $s0 def $q0
+; CHECK-NEXT:    // kill: def $s1 killed $s1 def $q1
+; CHECK-NEXT:    bif v0.16b, v1.16b, v2.16b
+; CHECK-NEXT:    // kill: def $s0 killed $s0 killed $q0
+; CHECK-NEXT:    ret
+  %sub = fsub float -0.0, %x
+  %copysign = call float @llvm.copysign.f32(float %sub, float %y)
+  ret float %copysign
+}
+
+define i1 @test_div_sqrt_fcmp(float %x) {
+; CHECK-LABEL: test_div_sqrt_fcmp:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fsqrt s0, s0
+; CHECK-NEXT:    fcmp s0, #0.0
+; CHECK-NEXT:    cset w0, gt
+; CHECK-NEXT:    ret
+  %sqrt = call float @llvm.sqrt.f32(float %x)
+  %div = fdiv reassoc float %x, %sqrt
+  %cmp = fcmp ogt float %div, 0.0
+  ret i1 %cmp
+}
+
+define float @test_frem_fabs(float %x) {
+; CHECK-LABEL: test_frem_fabs:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s1, #0.50000000
+; CHECK-NEXT:    fmov s2, #-2.00000000
+; CHECK-NEXT:    fmul s1, s0, s1
+; CHECK-NEXT:    frintz s1, s1
+; CHECK-NEXT:    fmadd s0, s1, s2, s0
+; CHECK-NEXT:    fabs s0, s0
+; CHECK-NEXT:    ret
+  %rem = frem float %x, 2.0
+  %abs = call float @llvm.fabs.f32(float %rem)
+  ret float %abs
+}
+
+declare float @llvm.fabs.f32(float)
+declare float @llvm.copysign.f32(float, float)
+declare float @llvm.sqrt.f32(float)
diff --git a/llvm/test/CodeGen/AMDGPU/swdev380865.ll b/llvm/test/CodeGen/AMDGPU/swdev380865.ll
index d4a8a0d762afd..1130c465c15e3 100644
--- a/llvm/test/CodeGen/AMDGPU/swdev380865.ll
+++ b/llvm/test/CodeGen/AMDGPU/swdev380865.ll
@@ -28,14 +28,13 @@ define amdgpu_kernel void @_Z6kernelILi4000ELi1EEvPd(ptr addrspace(1) %x.coerce)
 ; CHECK-NEXT:    v_mov_b32_e32 v1, s7
 ; CHECK-NEXT:  .LBB0_1: ; %for.cond4.preheader
 ; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], 0
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x40140000
-; CHECK-NEXT:    s_add_i32 s1, s1, s0
-; CHECK-NEXT:    s_cmpk_lt_i32 s1, 0xa00
 ; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], s[6:7]
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x40180000
+; CHECK-NEXT:    s_add_i32 s1, s1, s0
+; CHECK-NEXT:    s_cmpk_lt_i32 s1, 0xa00
 ; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], s[6:7]
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x401c0000

@github-actions
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index f2b4e74cc..a2c48f44b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18710,9 +18710,9 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
       TLI.isOperationLegalOrCustom(ISD::FDIV, VT) &&
       TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
       DAG.isKnownToBeAPowerOfTwoFP(N1)) {
-    bool NeedsCopySign =
-        !Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0) &&
-        !DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
+    bool NeedsCopySign = !Flags.hasNoSignedZeros() &&
+                         !DAG.cannotBeOrderedNegativeFP(N0) &&
+                         !DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
     SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
     SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
     SDValue MLA;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants