@@ -5442,6 +5442,166 @@ third argument, can only occur at file scope.
54425442 a = b[i] * c[i] + e;
54435443 }
54445444
5445+ Extensions for controlling atomic code generation
5446+ =================================================
5447+
5448+ The ``[[clang::atomic]] `` statement attribute enables users to control how
5449+ atomic operations are lowered in LLVM IR by conveying additional metadata to
5450+ the backend. The primary goal is to allow users to specify certain options,
5451+ like whether the affected atomic operations might be used with specific types of memory or
5452+ whether to ignore denormal mode correctness in floating-point operations,
5453+ without affecting the correctness of code that does not rely on these properties.
5454+
5455+ In LLVM, lowering of atomic operations (e.g ., ``atomicrmw ``) can differ based
5456+ on the target's capabilities. Some backends support native atomic instructions
5457+ only for certain operation types or alignments, or only in specific memory
5458+ regions. Likewise, floating-point atomic instructions may or may not respect
5459+ IEEE denormal requirements. When the user is unconcerned about denormal-mode
5460+ compliance (for performance reasons) or knows that certain atomic operations
5461+ will not be performed on a particular type of memory, extra hints are needed to
5462+ tell the backend how to proceed.
5463+
5464+ A classic example is an architecture where floating-point atomic add does not
5465+ fully conform to IEEE denormal-mode handling. If the user does not mind ignoring
5466+ that aspect, they would prefer to emit a faster hardware atomic instruction,
5467+ rather than a fallback or CAS loop. Conversely, on certain GPUs (e.g ., AMDGPU),
5468+ memory accessed via PCIe may only support a subset of atomic operations. To ensure
5469+ correct and efficient lowering, the compiler must know whether the user needs
5470+ the atomic operations to work with that type of memory.
5471+
5472+ The allowed atomic attribute values are now ``remote_memory ``, ``fine_grained_memory ``,
5473+ and ``ignore_denormal_mode ``, each optionally prefixed with ``no_ ``. The meanings
5474+ are as follows:
5475+
5476+ - ``remote_memory `` means atomic operations may be performed on remote
5477+ memory, i.e . memory accessed through off-chip interconnects (e.g ., PCIe).
5478+ On ROCm platforms using HIP, remote memory refers to memory accessed via
5479+ PCIe and is subject to specific atomic operation support. See
5480+ `ROCm PCIe Atomics <https://rocm.docs.amd.com/en/latest/conceptual/
5481+ pcie-atomics.html> `_ for further details. Prefixing with ``no_remote_memory `` indicates that
5482+ atomic operations should not be performed on remote memory.
5483+ - ``fine_grained_memory `` means atomic operations may be performed on fine-grained
5484+ memory, i.e . memory regions that support fine-grained coherence, where updates to
5485+ memory are visible to other parts of the system even while modifications are ongoing.
5486+ For example, in HIP, fine-grained coherence ensures that host and device share
5487+ up-to-date data without explicit synchronization (see
5488+ `HIP Definition <https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.3/how-to/hip_runtime_api/memory_management/coherence_control.html#coherence-control >`_).
5489+ Similarly, OpenCL 2.0 provides fine-grained synchronization in shared virtual memory
5490+ allocations, allowing concurrent modifications by host and device (see
5491+ `OpenCL 2.0 Overview <https://www.intel.com/content/www/us/en/developer/articles/technical/opencl-20-shared-virtual-memory-overview.html >`_).
5492+ Prefixing with ``no_fine_grained_memory `` indicates that atomic operations should not
5493+ be performed on fine-grained memory.
5494+ - ``ignore_denormal_mode `` means that atomic operations are allowed to ignore
5495+ correctness for denormal mode in floating-point operations, potentially improving
5496+ performance on architectures that handle denormals inefficiently. The negated form,
5497+ if specified as ``no_ignore_denormal_mode ``, would enforce strict denormal mode
5498+ correctness.
5499+
5500+ Any unspecified option is inherited from the global defaults, which can be set
5501+ by a compiler flag or the target's built-in defaults.
5502+
5503+ Within the same atomic attribute, duplicate and conflicting values are accepted,
5504+ and the last of any conflicting values wins. Multiple atomic attributes are
5505+ allowed for the same compound statement, and the last atomic attribute wins.
5506+
5507+ Without any atomic metadata, LLVM IR defaults to conservative settings for
5508+ correctness: atomic operations enforce denormal mode correctness and are assumed
5509+ to potentially use remote and fine-grained memory (i.e ., the equivalent of
5510+ ``remote_memory ``, ``fine_grained_memory ``, and ``no_ignore_denormal_mode ``).
5511+
5512+ The attribute may be applied only to a compound statement and looks like:
5513+
5514+ .. code-block :: c++
5515+
5516+ [[clang::atomic (remote_memory, fine_grained_memory, ignore_denormal_mode)]]
5517+ {
5518+ // Atomic instructions in this block carry extra metadata reflecting
5519+ // these user-specified options.
5520+ }
5521+
5522+ A new compiler option now globally sets the defaults for these atomic-lowering
5523+ options. The command-line format has changed to:
5524+
5525+ .. code-block :: console
5526+
5527+ $ clang -fatomic-remote-memory -fno-atomic-fine-grained-memory -fatomic-ignore-denormal-mode file.cpp
5528+
5529+ Each option has a corresponding flag:
5530+ ``-fatomic-remote-memory `` / ``-fno-atomic-remote-memory ``,
5531+ ``-fatomic-fine-grained-memory `` / ``-fno-atomic-fine-grained-memory ``,
5532+ and ``-fatomic-ignore-denormal-mode `` / ``-fno-atomic-ignore-denormal-mode ``.
5533+
5534+ Code using the ``[[clang::atomic]] `` attribute can then selectively override
5535+ the command-line defaults on a per-block basis. For instance:
5536+
5537+ .. code-block :: c++
5538+
5539+ // Suppose the global defaults assume:
5540+ // remote_memory, fine_grained_memory, and no_ignore_denormal_mode
5541+ // (for conservative correctness)
5542+
5543+ void example () {
5544+ // Locally override the settings: disable remote_memory and enable
5545+ // fine_grained_memory.
5546+ [[clang::atomic (no_remote_memory, fine_grained_memory)]]
5547+ {
5548+ // In this block:
5549+ // - Atomic operations are not performed on remote memory.
5550+ // - Atomic operations are performed on fine-grained memory.
5551+ // - The setting for denormal mode remains as the global default
5552+ // (typically no_ignore_denormal_mode, enforcing strict denormal mode correctness).
5553+ // ...
5554+ }
5555+ }
5556+
5557+ Function bodies do not accept statement attributes, so this will not work:
5558+
5559+ .. code-block :: c++
5560+
5561+ void func () [[clang::atomic (remote_memory)]] { // Wrong: applies to function type
5562+ }
5563+
5564+ Use the attribute on a compound statement within the function:
5565+
5566+ .. code-block :: c++
5567+
5568+ void func () {
5569+ [[clang::atomic (remote_memory)]]
5570+ {
5571+ // Atomic operations in this block carry the specified metadata.
5572+ }
5573+ }
5574+
5575+ The ``[[clang::atomic]] `` attribute affects only the code generation of atomic
5576+ instructions within the annotated compound statement. Clang attaches target-specific
5577+ metadata to those atomic instructions in the emitted LLVM IR to guide backend lowering.
5578+ This metadata is fixed at the Clang code generation phase and is not modified by later
5579+ LLVM passes (such as function inlining).
5580+
5581+ For example, consider:
5582+
5583+ .. code-block :: cpp
5584+
5585+ inline void func() {
5586+ [[clang::atomic(remote_memory)]]
5587+ {
5588+ // Atomic instructions lowered with metadata.
5589+ }
5590+ }
5591+
5592+ void foo() {
5593+ [[clang::atomic(no_remote_memory)]]
5594+ {
5595+ func(); // Inlined by LLVM, but the metadata from 'func()' remains unchanged.
5596+ }
5597+ }
5598+
5599+ Although current usage focuses on AMDGPU, the mechanism is general. Other
5600+ backends can ignore or implement their own responses to these flags if desired.
5601+ If a target does not understand or enforce these hints, the IR remains valid,
5602+ and the resulting program is still correct (although potentially less optimized
5603+ for that user's needs).
5604+
54455605Specifying an attribute for multiple declarations (#pragma clang attribute)
54465606===========================================================================
54475607
0 commit comments