Skip to content

Missed optimization for std::atomic_thread_fence(std::memory_order_seq_cst)Β #91731

@ConorWilliams

Description

@ConorWilliams

While implementing a lock-free-queue I noticed that the pop function was about twice as slow on clang vs gcc. After digging through the assembly on compiler explorer and then reducing to a minimal example it seems that this is happening:

Source GCC Clang
std::atomic_thread_fence(std::memory_order_seq_cst) lock or QWORD PTR [rsp], 0 mfence

The mfence instruction is much slower, MSVC also generates lock inc DWORD PTR __Guard$1[esp+4] instead of an mfence. I raised this on the r/cpp a while ago and was referred to this GCC patch which introduced the optimisation. How can we go about getting something like this into llvm? I have been using boost atomic which seems to generate better assembly but, it would be really nice to drop the dependency.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions