Skip to content

[x86-64] Avoid usage of multi-uop CMOVBE/CMOVNBE  #113965

@daniel-zabawa

Description

@daniel-zabawa

The CMOVBE/CMOVNBE instructions generate 2 uops and have a throughput of 1 for P-cores. Other CMOVs are a single uop with a throughput of 2.

The following case shows the backend generating the more expensive CMOVBE/CMOVA instructions:

//  file f.c
int f(int x) {
    if (x < 2)
      return x;
    long long int l = 1;
    long long int u = x;
    do {
      long long int m = (l + u) >> 1;
      if (m*m > x) u=m; else l=m;
    } while (l+1 < u);
    return (int)l;
}

Compiling the above with trunk as clang -O2 -march=core-avx2 -S f.c generates:

f(int):
        mov     eax, edi
        cmp     edi, 2
        jl      .LBB0_3
        mov     ecx, eax
        mov     eax, 1
        mov     rdx, rcx
.LBB0_2:
        lea     rsi, [rdx + rax]
        sar     rsi
        mov     rdi, rsi
        imul    rdi, rsi
        cmp     rdi, rcx
        cmovbe  rax, rsi
        cmova   rdx, rsi
        lea     rsi, [rax + 1]
        cmp     rsi, rdx
        jl      .LBB0_2
.LBB0_3:
        ret

The cmovge and cmovl instructions should be preferred to these where possible.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions