Skip to content

Suboptimal code-gen in the fundamental branchless-swap building block #5

@Voultapher

Description

@Voultapher

The fundamental branchless swap_if code produces suboptimal code on x86-64. I ported it to Rust and noticed that changing it yielded a 50% performance uplift for that function on Zen3, this will of course depend on the the hardware, but cmov seems to yield better results than setl/setg style code that is currently being produced. Probably helped by doing 8 instead of 10 instructions.

Here is the current version:

And here is the version that produces cmov code:

I think if you can find a way to reliably produce cmov instructions like LLVM does, you should see a noticeable speed improvement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions