-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
For code like the following:
// -------------- simple.cpp -------------- //
float compute(float a, float b, float c) {
#if defined(ENABLE_PRAGMA)
#pragma clang fp contract (off)
#endif
float product = a * b;
return product + c;
}
// ---------------------------------------- //
When -ffast-math
is used, the cross-statement FMA should happen (and it does). Enabling the pragma to turn OFF the fp contract bit requires an additional switch to make the pragma effective: -ffp-contract=fast-honor-pragmas
. That is, the FMA still happens if the pragma is enabled:
$ clang++ -S -mfma -O2 -ffast-math -o - simple.cpp | egrep 'mul|add'
vfmadd213ss %xmm2, %xmm1, %xmm0 # xmm0 = (xmm1 * xmm0) + xmm2
.addrsig
$ clang++ -DENABLE_PRAGMA -S -mfma -O2 -ffast-math -o - simple.cpp | egrep 'mul|add'
vfmadd213ss %xmm2, %xmm1, %xmm0 # xmm0 = (xmm1 * xmm0) + xmm2
.addrsig
$
The cross-statement FMA is suppressed only when the additional switch -ffp-contract=fast-honor-pragmas
is applied after -ffast-math
(as documented):
$ clang++ -DENABLE_PRAGMA -S -mfma -O2 -ffast-math -ffp-contract=fast-honor-pragmas -o - simple.cpp | egrep 'mul|add'
clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
vmulss %xmm0, %xmm1, %xmm0
vaddss %xmm2, %xmm0, %xmm0
.addrsig
$
However, when IR is generated, the cross-statement FMA is suppressed when the pragma is enabled, both with and without the -ffp-contract=fast-honor-pragmas
switch:
$ clang++ -S -mfma -emit-llvm -O2 -ffast-math -o - simple.cpp | egrep 'fmul|fadd' # All the fast-math-flags are on.
%mul = fmul fast float %b, %a
%add = fadd fast float %mul, %c
$ clang++ -DENABLE_PRAGMA -S -mfma -emit-llvm -O2 -ffast-math -o - simple.cpp | egrep 'fmul|fadd' # The 'contract' fast-math-flag is suppressed (INCORRECT).
%mul = fmul reassoc nnan ninf nsz arcp afn float %b, %a
%add = fadd reassoc nnan ninf nsz arcp afn float %mul, %c
$ clang++ -DENABLE_PRAGMA -S -mfma -emit-llvm -O2 -ffast-math -ffp-contract=fast-honor-pragmas -o - simple.cpp | egrep 'fmul|fadd' # The 'contract' fast-math-flag is suppressed (correct).
clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
%mul = fmul reassoc nnan ninf nsz arcp afn float %b, %a
%add = fadd reassoc nnan ninf nsz arcp afn float %mul, %c
$
A consequence of this is that when LTO is enabled, if the user has a pragma to disable fp contract, it doesn't work. That is, the cross-statement FMA is only supposed to be suppressed by the pragma when -ffp-contract=fast-honor-pragmas
is specified (to enable the effectiveness of the pragma). But it is always suppressed by the pragma (even without that switch) when using LTO.
Here is a standalone run-able test-case to illustrate. It contains a cross-statement FMA opportunity, and the values that feed into the FMA are such that there is a small numeric difference when the FMA is performed.
$ cat lto_test.cpp
// 'noinline' just to make it easy to inspect the generated code.
__attribute__((noinline)) float compute(float a, float b, float c) {
#pragma clang fp contract (off)
float product = a * b;
return product + c;
}
// Declare 'volatile' to suppress compile-time folding:
volatile float x = 1.7200003f;
volatile float y = 2.0720003f;
volatile float z = 3.5720001f;
extern "C" int printf(const char *, ...);
int main() {
float result = compute(x, y, z);
// Result depends on whether FMA happens:
// FMA does happen: 7.1358409e+00
// FMA does not happen: 7.1358414e+00
printf("Result: %.7e\n", (double) result);
return 0;
}
$
$ # `-ffast-math` enables FMA, but the pragma suppresses it:
$ clang++ -o test.no_fma.pragma.elf -mfma -O2 -ffast-math -ffp-contract=fast-honor-pragmas lto_test.cpp
clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
$ test.no_fma.pragma.elf
Result: 7.1358414e+00
$
$ # `-ffast-math` is not on, so cross-statement FMA does not happen:
$ clang++ -o test.no_fma.elf -mfma -O2 lto_test.cpp
$ test.no_fma.elf
Result: 7.1358414e+00
$
$ # `-ffast-math` is on, so cross-statement FMA does happen:
$ clang++ -o test.yes_fma.elf -mfma -O2 -ffast-math lto_test.cpp
$ test.yes_fma.elf
Result: 7.1358409e+00
$
$ # Same as prev but with LTO enabled, so FMA should happen, but it does not (the bug):
$ clang++ -o test.should_be_yes_fma.lto.elf -mfma -flto -O2 -ffast-math lto_test.cpp
$ test.should_be_yes_fma.lto.elf
Result: 7.1358414e+00
$
For reference, here are some points of discussion about the -ffp-contract=fast-honor-pragmas
concept:
https://discourse.llvm.org/t/fp-contract-fast-and-pragmas/58529
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797/14
As an aside, the reason I came across this is because at PlayStation we want to always honor the pragma. In fact, we've had private changes in our downstream code in-place to honor the pragma since our llvm11-based release (this was before the concept of -ffp-contract=fast-honor-pragmas
was created -- consequently, at that time we thought that the pragma not being honored was simply a bug). I have proposed a patch to always honor the pragmas for PlayStation: #162549
The test-case for that patch doesn't use the usual approach of checking the generated IR, because of this bug (so it checks the generated assembly code).