You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `nvvm.fence.{semantics}.sync_restrict.*` restrict the class of memory
814
+
operations for which the fence instruction provides the memory ordering guarantees.
815
+
When `.sync_restrict` is restricted to `shared_cta`, then memory semantics must
816
+
be `release` and the effect of the fence operation only applies to operations
817
+
performed on objects in `shared_cta` space. Likewise, when `sync_restrict` is
818
+
restricted to `shared_cluster`, then memory semantics must be `acquire` and the
819
+
effect of the fence operation only applies to operations performed on objects in
820
+
`shared_cluster` memory space. The scope for both operations is `cluster`. For more details,
821
+
please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
`nvvm.fence.mbarrier_init.release.cluster` intrinsic restrict the class of
837
+
memory operations for which the fence instruction provides the memory ordering
838
+
guarantees. The `mbarrier_init` modifiers restricts the synchronizing effect to
839
+
the prior `mbarrier_init` operation executed by the same thread on mbarrier objects
840
+
in `shared_cta` memory space. For more details, please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
`nvvm.fence.proxy.async_generic.{semantics}.sync_restrict` are used to establish
857
+
ordering between a prior memory access performed via the `async proxy<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#proxies>__`
858
+
and a subsequent memory access performed via the generic proxy.
859
+
``nvvm.fence.proxy.async_generic.release.sync_restrict`` can form a release
860
+
sequence that synchronizes with an acquire sequence that contains the
861
+
``nvvm.fence.proxy.async_generic.acquire.sync_restrict`` proxy fence. When
862
+
`.sync_restrict` is restricted to `shared_cta`, then memory semantics must
863
+
be `release` and the effect of the fence operation only applies to operations
864
+
performed on objects in `shared_cta` space. Likewise, when `sync_restrict` is
865
+
restricted to `shared_cluster`, then memory semantics must be `acquire` and the
866
+
effect of the fence operation only applies to operations performed on objects in
867
+
`shared_cluster` memory space. The scope for both operations is `cluster`.
868
+
For more details, please refer the `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar>`__
0 commit comments