Suboptimal code generation for vectorized version of llvm.ctlz() for int64 on x86-64

LLVM generates suboptimal code for `llvm.ctlz()` on the int64 type across various x86-64 instruction sets (SSE4–AVX2) before AVX512. Performance measurements indicate that extracting individual 64-bit values from the `ymm` register and applying `lzcnt` separately to each yields a 25% improvement on AVX2 and a 124% improvement on SSE4, compared to `llvm.ctlz` vectorized implementation.

Please see the example here: https://ispc.godbolt.org/z/EEErrednx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suboptimal code generation for vectorized version of llvm.ctlz() for int64 on x86-64 #124993

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suboptimal code generation for vectorized version of llvm.ctlz() for int64 on x86-64 #124993

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions