Skip to content

[AArch64] QNaN check after fsqrt instruction is slow #122081

@kasuga-fj

Description

@kasuga-fj

It looks like we are about 100% behind for the following function (where N=10000) on Neoverse V2.
Compilation options: -O3 -mcpu=neoverse-v2

#include <math.h>

void f(int n, double *arr, double m) {
    for (int i = 0; i < n; i++) {
        arr[i] = sqrt(arr[i] * m);
    }
}

godbolt: https://godbolt.org/z/57Yqj15KP

I tried to analyze the root cause and found out that the fcmp instruction after fsqrt takes a lot of time. The fcmp checks if the result of fsqrt is QNaN or not, then jumps to the library function call branch if necessary. This problem happens even if the all the element in arr is positive, so we don't jump to branch the library function call. Avoiding this check by adding options like -fno-honor-nan resolved the performance gap between gcc and clang. I think we should insert a comparison instruction before the fsqrt instruction like gcc does.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions