- 
                Notifications
    
You must be signed in to change notification settings  - Fork 15.1k
 
Description
As a result of a micro-optimization, we have the rust compiler generating several different variations of assertions that a particular 32-bit floating point value is zero with the hope that LLVM can optimize away a multiplication with said variable. Unfortunately, despite trying several different variations of "informing" LLVM that both the sign and shape of the f32 variable match those of +0.0, it doesn't seem to be able to perform this optimization.
Test case: +0.0 bitpattern is directly asserted via intrinsic
define noundef float @assert_bitpattern(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
  %cond = tail call i1 @llvm.is.fpclass.f32(float %y, i32 64)
  tail call void @llvm.assume(i1 %cond)
  %_7 = fmul float %x, 9.000000e+00
  %_8 = fmul float %y, 8.000000e+00
  %_6 = fadd float %_7, %_8
  %_0 = fadd float %z, %_6
  ret float %_0
}Result: no different than when @llvm.assume() isn't called, with both multiplications preserved and the addition of the multiplication by zero being included in the summation:
.LCPI0_0:
        .long   0x41100000                      # float 9
.LCPI0_1:
        .long   0x41000000                      # float 8
bitcast:                                # @bitcast
        mulss   xmm0, dword ptr [rip + .LCPI0_0]
        mulss   xmm1, dword ptr [rip + .LCPI0_1]
        addss   xmm0, xmm1
        addss   xmm0, xmm2
        xorps   xmm1, xmm1
        addss   xmm0, xmm1
        retTest case: assert the bitpattern via type punning
define noundef float @bitcast(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
  %y_as_i32 = bitcast float %y to i32
  %cond = icmp eq i32 %y_as_i32, 0
  tail call void @llvm.assume(i1 %cond)
  %_7 = fmul float %x, 9.000000e+00
  %_8 = fmul float %y, 8.000000e+00
  %_6 = fadd float %_7, %_8
  %_5 = fadd float %z, %_6
  %_0 = fadd float %_5, 0.000000e+00
  ret float %_0
}Result: same as when asserting the bitpattern via @llvm.is.fpclass.f32(float %y, i32 64)
Test case: assert magnitude is zero (fcmp oeq float .., 0.0000e+00) performing an operation where the result does not change regardless of whether the variable is specifically +0.0 or -0.0 (because +0.0 is added at the end of the fp32 summation)
define noundef float @sign_irrelevant(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
  %cond = fcmp oeq float %y, 0.000000e+00
  tail call void @llvm.assume(i1 %cond)
  %_7 = fmul float %x, 9.000000e+00
  %_8 = fmul float %y, 8.000000e+00
  %_6 = fadd float %_7, %_8
  %_5 = fadd float %z, %_6
  %_0 = fadd float %_5, 0.000000e+00
  ret float %_0
}Result: no different than when @llvm.assume() isn't called. (Regardless of whether or not the final fadd float %_5, 0.000e+00 is optimized away I would expect the fmul float %y, ... to be elided.)
Test case: assert that the compiler is capable of at least folding away the operation under any circumstance
define noundef float @folded_away(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
  %y_ = fadd float 0.000000e+00, 0.000000e+00
  %_7 = fmul float %x, 9.000000e+00
  %_8 = fmul float %y_, 8.000000e+00
  %_6 = fadd float %_7, %_8
  %_5 = fadd float %z, %_6
  %_0 = fadd float %_5, 0.000000e+00
  ret float %_0
}Result: here we finally observe the compiler optimizing away the multiplication and subsequent addition:
.LCPI3_0:
        .long   0x41100000                      # float 9
folded_away:                            # @folded_away
        mulss   xmm0, dword ptr [rip + .LCPI3_0]
        xorps   xmm1, xmm1
        addss   xmm0, xmm1
        addss   xmm0, xmm2
        addss   xmm0, xmm1
        retThis is the assembly I would have expected ~all the test cases above to generate.
LLVM version: trunk as well as 21.1.0 and earlier versions
Target architecture: x86_64
Command line flags: -O3
Godbolt link: https://llvm.godbolt.org/z/7W95Ehsrd