@@ -537,6 +537,8 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
537537 - Packed
538538 work-item Add product
539539 IDs names.
540+ - Workgroup
541+ Clusters
540542
541543 =========== =============== ============ ===== ================= =============== =============== ======================
542544
@@ -768,9 +770,6 @@ For example:
768770 performant than code generated for XNACK replay
769771 disabled.
770772
771- cu-stores TODO On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used.
772- If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater.
773-
774773 =============== ============================ ==================================================
775774
776775.. _amdgpu-target-id:
@@ -1098,6 +1097,22 @@ is conservatively correct for OpenCL.
10981097 - ``wavefront`` and executed by a thread in the
10991098 same wavefront.
11001099
1100+ ``cluster`` Synchronizes with, and participates in modification
1101+ and seq_cst total orderings with, other operations
1102+ (except image operations) for all address spaces
1103+ (except private, or generic that accesses private)
1104+ provided the other operation's sync scope is:
1105+
1106+ - ``system``, ``agent`` or ``cluster`` and
1107+ executed by a thread on the same cluster.
1108+ - ``workgroup`` and executed by a thread in the
1109+ same work-group.
1110+ - ``wavefront`` and executed by a thread in the
1111+ same wavefront.
1112+
1113+ On targets that do not support workgroup cluster
1114+ launch mode, this behaves like ``agent`` scope instead.
1115+
11011116 ``workgroup`` Synchronizes with, and participates in modification
11021117 and seq_cst total orderings with, other operations
11031118 (except image operations) for all address spaces
@@ -1131,6 +1146,9 @@ is conservatively correct for OpenCL.
11311146 ``agent-one-as`` Same as ``agent`` but only synchronizes with other
11321147 operations within the same address space.
11331148
1149+ ``cluster-one-as`` Same as ``cluster`` but only synchronizes with other
1150+ operations within the same address space.
1151+
11341152 ``workgroup-one-as`` Same as ``workgroup`` but only synchronizes with
11351153 other operations within the same address space.
11361154
@@ -5114,9 +5132,7 @@ The fields used by CP for code objects before V3 also match those specified in
51145132 and must be 0,
51155133 >454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT
51165134 _SIZE
5117- 455 1 bit USES_CU_STORES GFX12.5: Whether the ``cu-stores`` target attribute is enabled.
5118- If 0, then all stores are ``SCOPE_SE`` or higher.
5119- 457:456 2 bits Reserved, must be 0.
5135+ 457:455 3 bits Reserved, must be 0.
51205136 458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9
51215137 Reserved, must be 0.
51225138 GFX10-GFX11
@@ -18254,8 +18270,6 @@ terminated by an ``.end_amdhsa_kernel`` directive.
1825418270 GFX942)
1825518271 ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
1825618272 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
18257- ``.amdhsa_uses_cu_stores`` 0 GFX12.5 Controls USES_CU_STORES in
18258- :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
1825918273 ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in
1826018274 Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
1826118275 Specific
0 commit comments