Skip to content

Conversation

@arsenm
Copy link
Contributor

@arsenm arsenm commented May 12, 2025

Try to respect the signaling nan behavior of the instruction,
so also start the special case fold for src2.

@arsenm arsenm added backend:AMDGPU llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels May 12, 2025 — with Graphite App
@arsenm arsenm requested review from Pierre-vh, jayfoad and rampitec May 12, 2025 13:38
@arsenm arsenm marked this pull request as ready for review May 12, 2025 13:39
Copy link
Contributor Author

arsenm commented May 12, 2025

@arsenm arsenm changed the title AMDGPU: Add fmed3 fold tests with flags AMDGPU: Use minimumnum/maximumnum for fmed3 with amdgpu-ieee=0 May 12, 2025
@llvmbot
Copy link
Member

llvmbot commented May 12, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

AMDGPU: Add fmed3 fold tests with flags

AMDGPU: Use minimumnum/maximumnum for fmed3 with amdgpu-ieee=0

Try to respect the signaling nan behavior of the instruction,
so also start the special case fold for src2.


Patch is 32.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139546.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+97-9)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/fmed3.ll (+294-109)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index 1ca300464a697..f2a2cf41eab90 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -59,6 +59,28 @@ static APFloat fmed3AMDGCN(const APFloat &Src0, const APFloat &Src1,
   return maxnum(Src0, Src1);
 }
 
+enum class KnownIEEEMode { Unknown, On, Off };
+
+/// Return KnownIEEEMode::On if we know if the use context can assume
+/// "amdgpu-ieee"="true" and KnownIEEEMode::Off if we can assume
+/// "amdgpu-ieee"="false".
+static KnownIEEEMode fpenvIEEEMode(const Instruction &I,
+                                   const GCNSubtarget &ST) {
+  if (!ST.hasIEEEMode()) // Only mode on gfx12
+    return KnownIEEEMode::On;
+
+  const Function *F = I.getFunction();
+  if (!F)
+    return KnownIEEEMode::Unknown;
+
+  Attribute IEEEAttr = F->getFnAttribute("amdgpu-ieee");
+  if (IEEEAttr.isValid())
+    return IEEEAttr.getValueAsBool() ? KnownIEEEMode::On : KnownIEEEMode::Off;
+
+  return AMDGPU::isShader(F->getCallingConv()) ? KnownIEEEMode::Off
+                                               : KnownIEEEMode::On;
+}
+
 // Check if a value can be converted to a 16-bit value without losing
 // precision.
 // The value is expected to be either a float (IsFloat = true) or an unsigned
@@ -843,9 +865,6 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
     break;
   }
   case Intrinsic::amdgcn_fmed3: {
-    // Note this does not preserve proper sNaN behavior if IEEE-mode is enabled
-    // for the shader.
-
     Value *Src0 = II.getArgOperand(0);
     Value *Src1 = II.getArgOperand(1);
     Value *Src2 = II.getArgOperand(2);
@@ -858,16 +877,85 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
     if (II.isStrictFP())
       break;
 
+    // med3 with a nan input acts like
+    // v_min_f32(v_min_f32(s0, s1), s2)
+    //
+    // Signalingness is ignored with ieee=0, so we fold to
+    // minimumnum/maximumnum. With ieee=1, the v_min_f32 acts like llvm.minnum
+    // with signaling nan handling. With ieee=0, like llvm.minimumnum except a
+    // returned signaling nan will not be quieted.
+
+    // ieee=1
+    // s0 snan: s2
+    // s1 snan: s2
+    // s2 snan: qnan
+
+    // s0 qnan: min(s1, s2)
+    // s1 qnan: min(s0, s2)
+    // s2 qnan: min(s0, s1)
+
+    // ieee=0
+    // s0 _nan: min(s1, s2)
+    // s1 _nan: min(s0, s2)
+    // s2 _nan: min(s0, s1)
+
     // Checking for NaN before canonicalization provides better fidelity when
     // mapping other operations onto fmed3 since the order of operands is
     // unchanged.
     Value *V = nullptr;
-    if (match(Src0, PatternMatch::m_NaN()) || isa<UndefValue>(Src0)) {
-      V = IC.Builder.CreateMinNum(Src1, Src2);
-    } else if (match(Src1, PatternMatch::m_NaN()) || isa<UndefValue>(Src1)) {
-      V = IC.Builder.CreateMinNum(Src0, Src2);
-    } else if (match(Src2, PatternMatch::m_NaN()) || isa<UndefValue>(Src2)) {
-      V = IC.Builder.CreateMinNum(Src0, Src1);
+    const APFloat *ConstSrc0 = nullptr;
+    const APFloat *ConstSrc1 = nullptr;
+    const APFloat *ConstSrc2 = nullptr;
+
+    // TODO: Also can fold to 2 operands with infinities.
+    if ((match(Src0, m_APFloat(ConstSrc0)) && ConstSrc0->isNaN()) ||
+        isa<UndefValue>(Src0)) {
+      switch (fpenvIEEEMode(II, *ST)) {
+      case KnownIEEEMode::On:
+        // TODO: If Src2 is snan, does it need quieting?
+        if (ConstSrc0 && ConstSrc0->isSignaling())
+          return IC.replaceInstUsesWith(II, Src2);
+        V = IC.Builder.CreateMinNum(Src1, Src2);
+        break;
+      case KnownIEEEMode::Off:
+        V = IC.Builder.CreateMinimumNum(Src1, Src2);
+        break;
+      case KnownIEEEMode::Unknown:
+        break;
+      }
+    } else if ((match(Src1, m_APFloat(ConstSrc1)) && ConstSrc1->isNaN()) ||
+               isa<UndefValue>(Src1)) {
+      switch (fpenvIEEEMode(II, *ST)) {
+      case KnownIEEEMode::On:
+        // TODO: If Src2 is snan, does it need quieting?
+        if (ConstSrc1 && ConstSrc1->isSignaling())
+          return IC.replaceInstUsesWith(II, Src2);
+
+        V = IC.Builder.CreateMinNum(Src0, Src2);
+        break;
+      case KnownIEEEMode::Off:
+        V = IC.Builder.CreateMinimumNum(Src0, Src2);
+        break;
+      case KnownIEEEMode::Unknown:
+        break;
+      }
+    } else if ((match(Src2, m_APFloat(ConstSrc2)) && ConstSrc2->isNaN()) ||
+               isa<UndefValue>(Src2)) {
+      switch (fpenvIEEEMode(II, *ST)) {
+      case KnownIEEEMode::On:
+        if (ConstSrc2 && ConstSrc2->isSignaling()) {
+          auto *Quieted = ConstantFP::get(II.getType(), ConstSrc2->makeQuiet());
+          return IC.replaceInstUsesWith(II, Quieted);
+        }
+
+        V = IC.Builder.CreateMinNum(Src0, Src1);
+        break;
+      case KnownIEEEMode::Off:
+        V = IC.Builder.CreateMaximumNum(Src0, Src1);
+        break;
+      case KnownIEEEMode::Unknown:
+        break;
+      }
     }
 
     if (V) {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/fmed3.ll b/llvm/test/Transforms/InstCombine/AMDGPU/fmed3.ll
index 972862d8e327e..d9311008bd680 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/fmed3.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/fmed3.ll
@@ -1,12 +1,12 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
 
 ; Test with "amdgpu-ieee" set to true and false
-; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=instcombine -mcpu=gfx600 < %s | FileCheck -check-prefixes=CHECK,IEEE1 %s
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=instcombine -mcpu=gfx600 < %s | FileCheck -check-prefixes=CHECK,IEEE1,HAS-IEEE-BIT1 %s
 ; RUN: sed 's/\"true\"/\"false\"/g' %s | opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 -passes=instcombine | FileCheck -check-prefixes=CHECK,IEEE0 %s
 
-; Test with gfx12 since there is no ieee bit anymore.
-; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -passes=instcombine < %s | FileCheck -check-prefixes=CHECK,IEEE1 %s
-; RUN: sed 's/\"true\"/\"false\"/g' %s | opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -passes=instcombine | FileCheck -check-prefixes=CHECK,IEEE0 %s
+; Test with gfx12 since there is no ieee bit anymore and the attribute is ignored.
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -passes=instcombine < %s | FileCheck -check-prefixes=CHECK,IEEE1,NO-IEEE-BIT %s
+; RUN: sed 's/\"true\"/\"false\"/g' %s | opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -passes=instcombine | FileCheck -check-prefixes=CHECK,IEEE1,NO-IEEE-BIT %s
 
 ; --------------------------------------------------------------------
 ; llvm.amdgcn.fmed3
@@ -85,80 +85,120 @@ define float @fmed3_canonicalize_c_x_y_f32(float %x, float %y) #1 {
 }
 
 define float @fmed3_undef_x_y_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_undef_x_y_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_undef_x_y_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
+; IEEE1-NEXT:    ret float [[MED3]]
+;
+; IEEE0-LABEL: define float @fmed3_undef_x_y_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)
   ret float %med3
 }
 
 define float @fmed3_fmf_undef_x_y_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_fmf_undef_x_y_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call nnan float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_fmf_undef_x_y_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    [[MED3:%.*]] = call nnan float @llvm.minnum.f32(float [[X]], float [[Y]])
+; IEEE1-NEXT:    ret float [[MED3]]
+;
+; IEEE0-LABEL: define float @fmed3_fmf_undef_x_y_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call nnan float @llvm.minimumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call nnan float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)
   ret float %med3
 }
 
 define float @fmed3_x_undef_y_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_x_undef_y_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_x_undef_y_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
+; IEEE1-NEXT:    ret float [[MED3]]
+;
+; IEEE0-LABEL: define float @fmed3_x_undef_y_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float undef, float %y)
   ret float %med3
 }
 
 define float @fmed3_x_y_undef_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_x_y_undef_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_x_y_undef_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
+; IEEE1-NEXT:    ret float [[MED3]]
+;
+; IEEE0-LABEL: define float @fmed3_x_y_undef_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.maximumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float undef)
   ret float %med3
 }
 
 define float @fmed3_qnan0_x_y_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_qnan0_x_y_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_qnan0_x_y_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
+; IEEE1-NEXT:    ret float [[MED3]]
+;
+; IEEE0-LABEL: define float @fmed3_qnan0_x_y_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000000000000, float %x, float %y)
   ret float %med3
 }
 
 define float @fmed3_x_qnan0_y_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_x_qnan0_y_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_x_qnan0_y_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
+; IEEE1-NEXT:    ret float [[MED3]]
+;
+; IEEE0-LABEL: define float @fmed3_x_qnan0_y_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0x7FF8000000000000, float %y)
   ret float %med3
 }
 
 define float @fmed3_x_y_qnan0_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_x_y_qnan0_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_x_y_qnan0_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
+; IEEE1-NEXT:    ret float [[MED3]]
+;
+; IEEE0-LABEL: define float @fmed3_x_y_qnan0_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.maximumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float 0x7FF8000000000000)
   ret float %med3
 }
 
 define float @fmed3_qnan1_x_y_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_qnan1_x_y_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_qnan1_x_y_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
+; IEEE1-NEXT:    ret float [[MED3]]
+;
+; IEEE0-LABEL: define float @fmed3_qnan1_x_y_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000100000000, float %x, float %y)
   ret float %med3
@@ -229,27 +269,42 @@ define float @fmed3_constant_src2_1_f32(float %x, float %y) #1 {
 }
 
 define float @fmed3_x_qnan0_qnan1_f32(float %x) #1 {
-; CHECK-LABEL: define float @fmed3_x_qnan0_qnan1_f32(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    ret float [[X]]
+; IEEE1-LABEL: define float @fmed3_x_qnan0_qnan1_f32(
+; IEEE1-SAME: float [[X:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float [[X]]
+;
+; IEEE0-LABEL: define float @fmed3_x_qnan0_qnan1_f32(
+; IEEE0-SAME: float [[X:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float 0x7FF8002000000000)
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0x7FF8001000000000, float 0x7FF8002000000000)
   ret float %med3
 }
 
 define float @fmed3_qnan0_x_qnan1_f32(float %x) #1 {
-; CHECK-LABEL: define float @fmed3_qnan0_x_qnan1_f32(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    ret float [[X]]
+; IEEE1-LABEL: define float @fmed3_qnan0_x_qnan1_f32(
+; IEEE1-SAME: float [[X:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float [[X]]
+;
+; IEEE0-LABEL: define float @fmed3_qnan0_x_qnan1_f32(
+; IEEE0-SAME: float [[X:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float 0x7FF8002000000000)
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float %x, float 0x7FF8002000000000)
   ret float %med3
 }
 
 define float @fmed3_qnan0_qnan1_x_f32(float %x) #1 {
-; CHECK-LABEL: define float @fmed3_qnan0_qnan1_x_f32(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    ret float [[X]]
+; IEEE1-LABEL: define float @fmed3_qnan0_qnan1_x_f32(
+; IEEE1-SAME: float [[X:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float [[X]]
+;
+; IEEE0-LABEL: define float @fmed3_qnan0_qnan1_x_f32(
+; IEEE0-SAME: float [[X:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float 0x7FF8002000000000)
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float 0x7FF8002000000000, float %x)
   ret float %med3
@@ -274,9 +329,13 @@ define float @fmed3_0_nan_1_f32() #1 {
 }
 
 define float @fmed3_0_1_nan_f32() #1 {
-; CHECK-LABEL: define float @fmed3_0_1_nan_f32(
-; CHECK-SAME: ) #[[ATTR1]] {
-; CHECK-NEXT:    ret float 0.000000e+00
+; IEEE1-LABEL: define float @fmed3_0_1_nan_f32(
+; IEEE1-SAME: ) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float 0.000000e+00
+;
+; IEEE0-LABEL: define float @fmed3_0_1_nan_f32(
+; IEEE0-SAME: ) #[[ATTR1]] {
+; IEEE0-NEXT:    ret float 1.000000e+00
 ;
   %med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0x7FF8001000000000)
   ret float %med
@@ -301,9 +360,13 @@ define float @fmed3_0_undef_1_f32() #1 {
 }
 
 define float @fmed3_0_1_undef_f32() #1 {
-; CHECK-LABEL: define float @fmed3_0_1_undef_f32(
-; CHECK-SAME: ) #[[ATTR1]] {
-; CHECK-NEXT:    ret float 0.000000e+00
+; IEEE1-LABEL: define float @fmed3_0_1_undef_f32(
+; IEEE1-SAME: ) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float 0.000000e+00
+;
+; IEEE0-LABEL: define float @fmed3_0_1_undef_f32(
+; IEEE0-SAME: ) #[[ATTR1]] {
+; IEEE0-NEXT:    ret float 1.000000e+00
 ;
   %med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float undef)
   ret float %med
@@ -337,84 +400,122 @@ define float @fmed3_x_y_poison_f32(float %x, float %y) #1 {
 }
 
 define float @fmed3_snan1_x_y_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_snan1_x_y_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_snan1_x_y_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float [[Y]]
+;
+; IEEE0-LABEL: define float @fmed3_snan1_x_y_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF4000000000000, float %x, float %y)
   ret float %med3
 }
 
 define float @fmed3_x_snan1_y_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_x_snan1_y_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_x_snan1_y_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float [[Y]]
+;
+; IEEE0-LABEL: define float @fmed3_x_snan1_y_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0x7FF4000000000000, float %y)
   ret float %med3
 }
 
 define float @fmed3_x_y_snan1_f32(float %x, float %y) #1 {
-; CHECK-LABEL: define float @fmed3_x_y_snan1_f32(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
-; CHECK-NEXT:    ret float [[MED3]]
+; IEEE1-LABEL: define float @fmed3_x_y_snan1_f32(
+; IEEE1-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float 0x7FFC000000000000
+;
+; IEEE0-LABEL: define float @fmed3_x_y_snan1_f32(
+; IEEE0-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.maximumnum.f32(float [[X]], float [[Y]])
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float 0x7FF4000000000000)
   ret float %med3
 }
 
 define float @fmed3_snan1_x_snan2_f32(float %x) #1 {
-; CHECK-LABEL: define float @fmed3_snan1_x_snan2_f32(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR1]] {
-; CHECK-NEXT:    ret float [[X]]
+; IEEE1-LABEL: define float @fmed3_snan1_x_snan2_f32(
+; IEEE1-SAME: float [[X:%.*]]) #[[ATTR1]] {
+; IEEE1-NEXT:    ret float 0x7FF0000040000000
+;
+; IEEE0-LABEL: define float @fmed3_snan1_x_snan2_f32(
+; IEEE0-SAME: float [[X:%.*]]) #[[ATTR1]] {
+; IEEE0-NEXT:    [[MED3:%.*]] = call float @llvm.minimumnum.f32(float [[X]], float 0x7FF0000040000000)
+; IEEE0-NEXT:    ret float [[MED3]]
 ;
   %med3 = call float @llvm.amdgcn.fmed3....
[truncated]

@arsenm arsenm changed the title AMDGPU: Use minimumnum/maximumnum for fmed3 with amdgpu-ieee=0 AMDGPU: Use minimumnum/maximumnum for fmed3 with amdgpu-ieee=0 May 12, 2025
Copy link
Contributor Author

arsenm commented May 12, 2025

Merge activity

  • May 12, 2:11 PM EDT: A user started a stack merge that includes this pull request via Graphite.
  • May 12, 2:27 PM EDT: Graphite rebased this pull request as part of a merge.
  • May 12, 2:30 PM EDT: Graphite rebased this pull request as part of a merge.
  • May 12, 2:31 PM EDT: @arsenm merged this pull request with Graphite.

@arsenm arsenm force-pushed the users/arsenm/amdgpu/fix-nan-constant-src2-fmed3-fold-minnum-not-maxnum branch 2 times, most recently from c50539d to 1bce310 Compare May 12, 2025 18:23
Base automatically changed from users/arsenm/amdgpu/fix-nan-constant-src2-fmed3-fold-minnum-not-maxnum to main May 12, 2025 18:26
@arsenm arsenm force-pushed the users/arsenm/amdgpu/fmed3-use-minimumnum-maximumnum-no-ieee-mode branch from 8da4a4a to e1186ec Compare May 12, 2025 18:26
arsenm added 2 commits May 12, 2025 18:29
Try to respect the signaling nan behavior of the instruction,
so also start the special case fold for src2.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/fmed3-use-minimumnum-maximumnum-no-ieee-mode branch from e1186ec to 79ef4a0 Compare May 12, 2025 18:29
@arsenm arsenm merged commit 038d357 into main May 12, 2025
6 of 9 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/fmed3-use-minimumnum-maximumnum-no-ieee-mode branch May 12, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants