Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion llvm/docs/AMDGPUUsage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14510,6 +14510,14 @@ For GFX12:
* A memory attached last level (MALL) cache exists for GPU memory.
The MALL cache is fully coherent with GPU memory and has no impact on system
coherence. All agents (GPU and CPU) access GPU memory through the MALL cache.
* The wait instructions below must be added before any ``SCOPE_SYS`` store in
order for the store to remain in order with previous memory operations.

* ``s_wait_loadcnt 0x0``
* ``s_wait_storecnt 0x0``
* ``s_wait_kmcnt 0x0``
* ``s_wait_samplecnt 0x0``
* ``s_wait_bvhcnt 0x0``
Comment on lines +14516 to +14520
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So s_wait_dscnt 0x0 is not needed? I notice it is present for the store release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No dscnt isn't needed, this only deals with system scope level stores as the reordering in this case occurs beyond L2.
LDS is always WG level so that causes no issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what does dscnt track, LDS? Why is it not needed for system scope? Remember we have scope inclusion in the memory model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DScnt is indeed LDS.
The reordering here can, AFAIK, only occur between two system scope operations. The reordering case happens somewhere after L2.
So we need to wait for any operations that could be at that level. LDS ops aren't one of them because they can't leave the workgroup under any scenario.


Scalar memory operations are only used to access memory that is proven to not
change during the execution of the kernel dispatch. This includes constant
Expand Down Expand Up @@ -14669,7 +14677,20 @@ the instruction in the code sequence that references the table.
- wavefront - generic
- workgroup - Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.
- agent
- system
store atomic monotonic - system - global 1. | ``s_wait_loadcnt 0x0``
- generic | ``s_wait_storecnt 0x0``
| ``s_wait_kmcnt 0x0``
| ``s_wait_samplecnt 0x0``
| ``s_wait_bvhcnt 0x0``

- The waits can be independently moved as long as the
counter they wait on is known to be zero before issuing
the following store instruction.

2. buffer/global/flat_store

- Apply :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx12-scopes-table`.

store atomic monotonic - singlethread - local 1. ds_store
- wavefront
- workgroup
Expand Down Expand Up @@ -15255,7 +15276,9 @@ the instruction in the code sequence that references the table.
| ``s_wait_storecnt 0x0``
| ``s_wait_loadcnt 0x0``
| ``s_wait_dscnt 0x0``
| ``s_wait_kmcnt 0x0``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency should these be listed in the same order as the changes above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


- If agent scope, omit ``s_wait_kmcnt 0x0``.
- If OpenCL, omit ``s_wait_dscnt 0x0``.
- The waits can be
independently moved
Expand Down
Loading