Skip to content

A pragma incorrectly suppresses FMA in the IR without -ffp-contract=fast-honor-pragmas #162550

@wjristow

Description

@wjristow

For code like the following:

// -------------- simple.cpp -------------- //
float compute(float a, float b, float c) {
#if defined(ENABLE_PRAGMA)
#pragma clang fp contract (off)
#endif
  float product = a * b;
  return product + c;
}
// ---------------------------------------- //

When -ffast-math is used, the cross-statement FMA should happen (and it does). Enabling the pragma to turn OFF the fp contract bit requires an additional switch to make the pragma effective: -ffp-contract=fast-honor-pragmas. That is, the FMA still happens if the pragma is enabled:

$ clang++ -S -mfma -O2 -ffast-math -o - simple.cpp | egrep 'mul|add'
        vfmadd213ss     %xmm2, %xmm1, %xmm0     # xmm0 = (xmm1 * xmm0) + xmm2
        .addrsig
$ clang++ -DENABLE_PRAGMA -S -mfma -O2 -ffast-math -o - simple.cpp | egrep 'mul|add'
        vfmadd213ss     %xmm2, %xmm1, %xmm0     # xmm0 = (xmm1 * xmm0) + xmm2
        .addrsig
$

The cross-statement FMA is suppressed only when the additional switch -ffp-contract=fast-honor-pragmas is applied after -ffast-math (as documented):

$ clang++ -DENABLE_PRAGMA -S -mfma -O2 -ffast-math -ffp-contract=fast-honor-pragmas -o - simple.cpp | egrep 'mul|add'
clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
        vmulss  %xmm0, %xmm1, %xmm0
        vaddss  %xmm2, %xmm0, %xmm0
        .addrsig
$

However, when IR is generated, the cross-statement FMA is suppressed when the pragma is enabled, both with and without the -ffp-contract=fast-honor-pragmas switch:

$ clang++ -S -mfma -emit-llvm -O2 -ffast-math -o - simple.cpp | egrep 'fmul|fadd' # All the fast-math-flags are on.
  %mul = fmul fast float %b, %a
  %add = fadd fast float %mul, %c
$ clang++ -DENABLE_PRAGMA -S -mfma -emit-llvm -O2 -ffast-math -o - simple.cpp | egrep 'fmul|fadd' # The 'contract' fast-math-flag is suppressed (INCORRECT).
  %mul = fmul reassoc nnan ninf nsz arcp afn float %b, %a
  %add = fadd reassoc nnan ninf nsz arcp afn float %mul, %c
$ clang++ -DENABLE_PRAGMA -S -mfma -emit-llvm -O2 -ffast-math -ffp-contract=fast-honor-pragmas -o - simple.cpp | egrep 'fmul|fadd' # The 'contract' fast-math-flag is suppressed (correct).
clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
  %mul = fmul reassoc nnan ninf nsz arcp afn float %b, %a
  %add = fadd reassoc nnan ninf nsz arcp afn float %mul, %c
$

A consequence of this is that when LTO is enabled, if the user has a pragma to disable fp contract, it doesn't work. That is, the cross-statement FMA is only supposed to be suppressed by the pragma when -ffp-contract=fast-honor-pragmas is specified (to enable the effectiveness of the pragma). But it is always suppressed by the pragma (even without that switch) when using LTO.

Here is a standalone run-able test-case to illustrate. It contains a cross-statement FMA opportunity, and the values that feed into the FMA are such that there is a small numeric difference when the FMA is performed.

$ cat lto_test.cpp
// 'noinline' just to make it easy to inspect the generated code.
__attribute__((noinline)) float compute(float a, float b, float c) {
#pragma clang fp contract (off)
  float product = a * b;
  return product + c;
}

// Declare 'volatile' to suppress compile-time folding:
volatile float x = 1.7200003f;
volatile float y = 2.0720003f;
volatile float z = 3.5720001f;

extern "C" int printf(const char *, ...);

int main() {
  float result = compute(x, y, z);
  // Result depends on whether FMA happens:
  //   FMA does happen:         7.1358409e+00
  //   FMA does not happen:     7.1358414e+00
  printf("Result: %.7e\n", (double) result);
  return 0;
}
$
$ # `-ffast-math` enables FMA, but the pragma suppresses it:
$ clang++ -o test.no_fma.pragma.elf -mfma -O2 -ffast-math -ffp-contract=fast-honor-pragmas lto_test.cpp
clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
$ test.no_fma.pragma.elf
Result: 7.1358414e+00
$
$ # `-ffast-math` is not on, so cross-statement FMA does not happen:
$ clang++ -o test.no_fma.elf -mfma -O2 lto_test.cpp
$ test.no_fma.elf
Result: 7.1358414e+00
$
$ # `-ffast-math` is on, so cross-statement FMA does happen:
$ clang++ -o test.yes_fma.elf -mfma -O2 -ffast-math lto_test.cpp
$ test.yes_fma.elf
Result: 7.1358409e+00
$
$ # Same as prev but with LTO enabled, so FMA should happen, but it does not (the bug):
$ clang++ -o test.should_be_yes_fma.lto.elf -mfma -flto -O2 -ffast-math lto_test.cpp
$ test.should_be_yes_fma.lto.elf
Result: 7.1358414e+00
$

For reference, here are some points of discussion about the -ffp-contract=fast-honor-pragmas concept:
https://discourse.llvm.org/t/fp-contract-fast-and-pragmas/58529
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797/14

As an aside, the reason I came across this is because at PlayStation we want to always honor the pragma. In fact, we've had private changes in our downstream code in-place to honor the pragma since our llvm11-based release (this was before the concept of -ffp-contract=fast-honor-pragmas was created -- consequently, at that time we thought that the pragma not being honored was simply a bug). I have proposed a patch to always honor the pragmas for PlayStation: #162549

The test-case for that patch doesn't use the usual approach of checking the generated IR, because of this bug (so it checks the generated assembly code).

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:codegenIR generation bugs: mangling, exceptions, etc.floating-pointFloating-point math

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions