Skip to content

[Flang][AArch64] Missed optimization opportunities with unsafe-math-optimizations #165809

@yus3710-fj

Description

@yus3710-fj

Duplicate of #165514


The execution time of Flang-compiled version of blts subroutine in blts.f90 from LU in NAS Parallel Benchmarks 3.4.3 is significantly slower than its GFortran-compiled one.

We measured the execution time using the perf command on Rocky Linux, running on an NVIDIA GRACE processor.
The sampling frequency for perf was set to 97, as detailed in the command below.

We used Flang built from commit 3e1d4d4144cc9d28ccd85cf49d6fc836c38ffbaa.
The GFortran version was 15.2.0.

All benchmarks were conducted by executing following command:

perf record --call-graph fp -F 97 -o data /path/to/compiled/binary

We observed that the execution time of Flang's compilation result, when compiled with the -Ofast -mcpu=neoverse-v2 option, was 167% of GFortran's execution time.
While the benchmark does not specify these options, we use them to obtain the optimal performance.

We found two factors: Flang's failure to apply adequate vectorization (#49896, #46522) and fast-math optimizations.

Specifically, Flang missed reciprocal-math optimization opportunities.
We compiled blts.f90 from LU in NAS Parallel Benchmarks with the following compile options:

-g -Ofast -mcpu=neoverse-v2 -fno-tree-vectorize -fno-fast-math -freciprocal-math

Note: Several parameters were specified to mitigate the influence of other factors on performance.

The execution time of Flang-compiled executable is 124% of GFortran's.
We checked the assembly code by toggling -freciprocal-math and found no diffs in Flang's output, whereas GFortran's output did show changes.

Additionally, while GFortran can optimize by specifying both -fno-signed-zeros and -fno-trapping-math, the latter (-fno-trapping-math) is not implemented in Flang.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions