Skip to content

Commit 095f4e6

Browse files
committed
s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior)
gcc's default setting for floating-point expression contraction is "fast", which allows the compiler to emit fused multiply adds instead of separate multiplies and adds (amongst others). Fused multiply-adds, which assembly kernels typically apply, also bring a significant performance advantage to the C implementation for matrix-matrix multiplication on s390x. To enable that performance advantage for builds with clang, add -ffp-contract=fast to the compiler options. Signed-off-by: Marius Hillenbrand <[email protected]>
1 parent 87e5bbd commit 095f4e6

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

Makefile.zarch

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,9 @@ ifeq ($(CORE), Z14)
88
CCOMMON_OPT += -march=z14 -mzvector -O3
99
FCOMMON_OPT += -march=z14 -mzvector
1010
endif
11+
12+
# Enable floating-point expression contraction for clang, since it is the
13+
# default for gcc
14+
ifeq ($(C_COMPILER), CLANG)
15+
CCOMMON_OPT += -ffp-contract=fast
16+
endif

0 commit comments

Comments
 (0)