[AArch64] `QNaN` check after `fsqrt` instruction is slow

It looks like we are about 100% behind for the following function (where `N=10000`) on Neoverse V2.
Compilation options: `-O3 -mcpu=neoverse-v2`

```
#include <math.h>

void f(int n, double *arr, double m) {
    for (int i = 0; i < n; i++) {
        arr[i] = sqrt(arr[i] * m);
    }
}
```

godbolt: https://godbolt.org/z/57Yqj15KP

I tried to analyze the root cause and found out that the `fcmp` instruction after `fsqrt` takes a lot of time. The `fcmp` checks if the result of `fsqrt` is `QNaN` or not, then jumps to the library function call branch if necessary. This problem happens even if the all the element in `arr` is positive, so we don't jump to branch the library function call. Avoiding this check by adding options like `-fno-honor-nan` resolved the performance gap between gcc and clang. I think we should insert a comparison instruction before the `fsqrt` instruction like gcc does.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] `QNaN` check after `fsqrt` instruction is slow #122081

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AArch64] QNaN check after fsqrt instruction is slow #122081

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[AArch64] `QNaN` check after `fsqrt` instruction is slow #122081