Skip to content

Suboptimal code generation for vectorized version of llvm.ctlz() for int64 on x86-64 #124993

@aneshlya

Description

@aneshlya

LLVM generates suboptimal code for llvm.ctlz() on the int64 type across various x86-64 instruction sets (SSE4–AVX2) before AVX512. Performance measurements indicate that extracting individual 64-bit values from the ymm register and applying lzcnt separately to each yields a 25% improvement on AVX2 and a 124% improvement on SSE4, compared to llvm.ctlz vectorized implementation.

Please see the example here: https://ispc.godbolt.org/z/EEErrednx

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions