Skip to content

x86 missed optimization for __builtin_clz()^31 #150954

@Alcaro

Description

@Alcaro
#include <stdint.h>
uint8_t b(uint32_t in)
{
    uint8_t ret = __builtin_clz(in) ^ 31;
    return ret;
}
uint8_t c(uint32_t in)
{
    uint8_t ret = __builtin_clz(in) ^ 31;
    return ret + 1;
}

Expected: Since b optimizes to a single bsr eax,edi, c should optimize to at most one instruction more.
Actual:

b(unsigned int):
        bsr     eax, edi
        ret

c(unsigned int):
        bsr     ecx, edi
        xor     ecx, 31
        mov     al, 32
        sub     al, cl
        ret

GCC gives good output (probably by being less clever about normalization), as does Clang if I add an extra optimization barrier. https://godbolt.org/z/3ansdnMxa

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions