Skip to content

Commit fc72ab8

Browse files
committed
[MLIR][NVVM][Docs] Update docs
This patch updates the NVVM Dialect docs to: * include information on the type of pointers for the memory spaces. * include high-level information on mbarrier objects. Signed-off-by: Durgadoss R <[email protected]>
1 parent f0e0a22 commit fc72ab8

File tree

1 file changed

+28
-6
lines changed

1 file changed

+28
-6
lines changed

mlir/docs/Dialects/NVVMDialect.md

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,23 +58,45 @@ scopes and lifetimes:
5858

5959
- **generic**: Can point to any memory space; requires runtime resolution of
6060
actual address space. Use when pointer origin is unknown at compile time.
61-
Performance varies based on the underlying memory space.
61+
Performance varies based on the underlying memory space. A pointer to this
62+
memory space is represented by `LLVM_PointerGeneric` in the NVVM Ops.
6263
- **global**: Accessible by all threads across all blocks; persists across
6364
kernel launches. Highest latency but largest capacity (device memory). Best
64-
for large data and inter-kernel communication.
65+
for large data and inter-kernel communication. A pointer to this memory space
66+
is represented by `LLVM_PointerGlobal` in the NVVM Ops.
6567
- **shared**: Shared within a thread block (CTA); very fast on-chip memory for
6668
cooperation between threads in the same block. Limited capacity. Ideal for
6769
block-level collaboration, caching, and reducing global memory traffic.
70+
This memory is usually referred as `shared_cta` in the NVVMOps and as
71+
`shared::cta` in the PTX ISA. A pointer to this memory space is represented
72+
by the `LLVM_PointerShared` type in the NVVM Ops.
6873
- **constant**: Read-only memory cached per SM. Size typically limited to 64KB.
69-
Best for read-only data and uniform values accessed by all threads.
74+
Best for read-only data and uniform values accessed by all threads. A pointer
75+
to this memory space is represented by `LLVM_PointerConst` type in NVVM Ops.
7076
- **local**: Private to each thread. Use for per-thread private data and
71-
automatic variables that don't fit in registers.
77+
automatic variables that don't fit in registers. A pointer to this memory is
78+
represented by `LLVM_PointerLocal` type in NVVM Ops.
7279
- **tensor**: Special memory space for tensor core operations. Used by
7380
`tcgen05` instructions on SM 100+ for tensor input/output operations.
81+
A pointer to this memory space is represented by the `LLVM_PointerTensor`
82+
type in the NVVM Ops.
7483
- **shared_cluster**: Distributed shared memory across thread blocks within a
7584
cluster (SM 90+). Enables collaboration beyond single-block scope with fast
76-
access across cluster threads.
77-
85+
access across cluster threads. This memory is usually referred as
86+
`shared_cluster` in the NVVMOps and as `shared::cluster` in the PTX ISA.
87+
A pointer to this memory space is represented by the `LLVM_PointerSharedCluster`
88+
type in the NVVM Ops.
89+
90+
## MBarrier objects
91+
92+
An ``mbarrier`` is a barrier created in shared memory that supports
93+
synchronizing any subset of threads within a CTA. An *mbarrier object*
94+
is an opaque object in shared memory with `.b64` type and an alignment of
95+
8-bytes. Unlike ``nvvm.barrier`` Op which can access only a limited number
96+
of barriers per CTA, the *mbarrier objects* are user-defined and are only
97+
limited by the total shared memory size available. The list of operations
98+
supported on an *mbarrier object* is exposed through the ``nvvm.mbarrier.*``
99+
family of NVVM Ops.
78100

79101
## Non-Goals
80102

0 commit comments

Comments
 (0)