[x86-64] Avoid usage of multi-uop CMOVBE/CMOVNBE 

The CMOVBE/CMOVNBE instructions generate 2 uops and have a throughput of 1 for P-cores. Other CMOVs are a single uop with a throughput of 2.

The following case shows the backend generating the more expensive CMOVBE/CMOVA instructions:

```
//  file f.c
int f(int x) {
    if (x < 2)
      return x;
    long long int l = 1;
    long long int u = x;
    do {
      long long int m = (l + u) >> 1;
      if (m*m > x) u=m; else l=m;
    } while (l+1 < u);
    return (int)l;
}
```

Compiling the above with trunk as `clang -O2 -march=core-avx2 -S f.c` generates:

```
f(int):
        mov     eax, edi
        cmp     edi, 2
        jl      .LBB0_3
        mov     ecx, eax
        mov     eax, 1
        mov     rdx, rcx
.LBB0_2:
        lea     rsi, [rdx + rax]
        sar     rsi
        mov     rdi, rsi
        imul    rdi, rsi
        cmp     rdi, rcx
        cmovbe  rax, rsi
        cmova   rdx, rsi
        lea     rsi, [rax + 1]
        cmp     rsi, rdx
        jl      .LBB0_2
.LBB0_3:
        ret
```

The `cmovge` and `cmovl` instructions should be preferred to these where possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[x86-64] Avoid usage of multi-uop CMOVBE/CMOVNBE #113965

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[x86-64] Avoid usage of multi-uop CMOVBE/CMOVNBE #113965

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions