Skip to content

Commit ee4eb7a

Browse files
committed
AMDGPU: Preliminary documentation for named barriers
1 parent 217f0e5 commit ee4eb7a

File tree

1 file changed

+179
-0
lines changed

1 file changed

+179
-0
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1179,6 +1179,53 @@ is conservatively correct for OpenCL.
11791179
other operations within the same address space.
11801180
======================= ===================================================
11811181

1182+
Target Types
1183+
------------
1184+
1185+
The AMDGPU backend implements some target extension types.
1186+
1187+
.. _amdgpu-types-named-barriers:
1188+
1189+
Named Barriers
1190+
~~~~~~~~~~~~~~
1191+
1192+
Named barriers are represented as memory objects of type
1193+
``target("amdgcn.named.barrier", 0)``. They are allocated as global variables
1194+
in the LDS address space. They do not occupy regular LDS memory, but their
1195+
lifetime and allocation granularity matches that of global variables in LDS.
1196+
1197+
The following types built from named barriers are supported in global variables,
1198+
defined recursively:
1199+
1200+
* a standalone ``target("amdgcn.named.barrier", 0)``
1201+
* an array of supported types
1202+
* a struct containing a single element of supported type
1203+
1204+
.. code-block:: llvm
1205+
1206+
@bar = addrspace(3) global target("amdgcn.named.barrier", 0) undef
1207+
@foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef
1208+
@baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef
1209+
1210+
Barrier types may not be used in ``alloca``.
1211+
1212+
The integral representation of a pointer to a valid named barrier is in the
1213+
range ``0x0080'0010`` to ``0x0080'0100`` (inclusive). The representation is
1214+
formed by the expression ``0x0080'0000 | (id << 4)``, where ``id`` is the
1215+
hardware barrier ID. The integral representation of the null named barrier is
1216+
``0x0080'0000``.
1217+
1218+
It is not legal to attempt to form a pointer to any non-named barrier objects.
1219+
1220+
It is undefined behavior to use a pointer to any part of a named barrier object
1221+
as the pointer operand of a regular memory access instruction or intrinsic.
1222+
Pointers to named barrier objects are intended to be used with dedicated
1223+
intrinsics.
1224+
1225+
We expand on the semantics of named barriers in
1226+
:ref:`the memory model section <amdgpu-memory-model-named-barriers>`.
1227+
1228+
11821229
LLVM IR Intrinsics
11831230
------------------
11841231

@@ -6621,6 +6668,138 @@ Multiple tags can be used at the same time to synchronize with more than one add
66216668
better code optimization, at the cost of synchronizing additional address
66226669
spaces.
66236670

6671+
.. _amdgpu-memory-model-barriers:
6672+
6673+
Hardware Barriers
6674+
+++++++++++++++++
6675+
6676+
.. note::
6677+
6678+
This section is preliminary. The semantics described here are intended to be
6679+
formalized properly in the future.
6680+
6681+
Hardware barriers synchronize execution between concurrently running waves using
6682+
fixed function hardware. Intuitively, a set of waves are "members" of a barrier.
6683+
Waves *signal* the barrier and later *wait* for it. Execution only proceeds past
6684+
the *wait* once all member waves have *signaled* the barrier.
6685+
6686+
Formally, barriers affect semantics in exactly two ways. First, they affect
6687+
forward progress. Waiting on a barrier that never completes (is not signaled
6688+
sufficiently) prevents forward progress and therefore, given the assumption of
6689+
forward progress, is undefined behavior. Second, barrier operations can pair
6690+
with fences to contribute *synchronizes-with* relations in the memory model.
6691+
6692+
Roughly speaking:
6693+
6694+
- Release fences pair with barrier signal operations that are later in program
6695+
order
6696+
- Barrier wait operations pair with acquire fences that are later in program
6697+
order
6698+
- If a barrier signal operation contributes to allowing a wait operation to
6699+
complete, then the corresponding paired fences can synchronize-with each
6700+
other (given compatible sync scopes and memory model relaxation annotations)
6701+
6702+
Default Barriers
6703+
################
6704+
6705+
There is a default workgroup barrier and a default cluster barrier. All waves
6706+
of a workgroup and cluster are members of the same default workgroup and
6707+
cluster barriers, respectively.
6708+
6709+
.. _amdgpu-memory-model-named-barriers:
6710+
6711+
Named Barriers
6712+
##############
6713+
6714+
All named barrier operations must occur in wave-uniform control flow. All
6715+
arguments of named barrier intrinsics must be wave-uniform.
6716+
6717+
Named barriers are allocated as global variables of
6718+
:ref:`a target extension type <amdgpu-types-named-barriers>`.
6719+
6720+
Named barriers may be signaled by the intrinsics:
6721+
6722+
.. code-block:: llvm
6723+
6724+
declare void @llvm.amdgcn.s.barrier.signal(i32 %barrier_hw_id)
6725+
declare void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %barrier_ptr, i32 %member_count)
6726+
6727+
If the second form is used and ``member_count`` is non-zero, the operation is
6728+
an *initializing* signal, else it is *non*-initializing.
6729+
6730+
Named barriers may be initialized explicitly using:
6731+
6732+
.. code-block:: llvm
6733+
6734+
declare void @llvm.amdgcn.s.barrier.init(ptr addrspace(3) %barrier_ptr, i32 %member_count)
6735+
6736+
It is possible to "leave" a named barrier. This decrements the named barrier's
6737+
member count and completes the barrier if all other members have signaled it:
6738+
6739+
.. code-block:: llvm
6740+
6741+
declare void @llvm.amdgcn.s.barrier.leave(i32 %barrier_type)
6742+
6743+
``barrier_type`` must be set to ``1``.
6744+
6745+
Note that leaving a named barrier is not exactly the opposite of joining a
6746+
barrier (for example, joining a barrier does not change its member count).
6747+
6748+
Leaving implicitly *joins* (see below) a null named barrier.
6749+
6750+
Signal, leave, and initializing operations on the same named barrier must obey
6751+
certain ordering constraints:
6752+
6753+
* Non-initializing signals must be ordered after some initializing signal or an
6754+
explicit initializing operation.
6755+
* Explicit initializing operations must not race signal or leave operations.
6756+
* Initializing signal operations must not race leave operations.
6757+
* Initializing signal operations with contradicting member counts must not race
6758+
each other.
6759+
6760+
The details of how these orders can be established and races prevented are tbd.
6761+
Using a default workgroup or cluster barrier in the natural way is guaranteed to
6762+
be sufficient.
6763+
6764+
In order to wait for a named barrier, a wave must first *join* the named barrier
6765+
using:
6766+
6767+
.. code-block:: llvm
6768+
6769+
declare void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) %barrier_ptr)
6770+
6771+
The named barrier may then be waited for using:
6772+
6773+
.. code-block:: llvm
6774+
6775+
declare void @llvm.amdgcn.s.barrier.wait(i32 %barrier_type)
6776+
6777+
... with ``barrier_type`` set to ``1``.
6778+
6779+
Signal, leave, join, and wait operations must obey certain ordering constraints.
6780+
The details are tbd. Satisfying the following rules is guaranteed to be
6781+
sufficient:
6782+
6783+
* Signal or wait for a named barrier only if it is the most recent to have been
6784+
joined in program order.
6785+
* Signal or leave a named barrier only if the number of prior signaling
6786+
operations on that named barrier since the most recent join in program order
6787+
is equal to the number of prior wait operations on that named barrier since
6788+
the most recent join in program order.
6789+
* Wait for a named barrier only if the number of prior signaling operations on
6790+
that named barrier since the most recent join in program order is one larger
6791+
than the number of prior wait operations on that named barrier since the most
6792+
recent join in program order.
6793+
* Do not signal a named barrier or wait for it in program order after leaving it.
6794+
6795+
Additionally, use signal, leave, and wait operations on a named barrier from a
6796+
consistent associated set of waves that is determined at initialization time and
6797+
whose initial size is the member count used at initialization. The set of waves
6798+
may shrink with leave operations. Operations on a named barrier object with
6799+
conflicting sets of waves must not race. The details of this rule and how an
6800+
ordering can be established to prevent a race is tbd. Using a default workgroup
6801+
or cluster barrier in the natural way is guaranteed to be sufficient.
6802+
66246803
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
66256804

66266805
Memory Model GFX6-GFX9

0 commit comments

Comments
 (0)