Skip to content

Commit 891547d

Browse files
committed
fx
1 parent bbf17fd commit 891547d

File tree

1 file changed

+18
-22
lines changed

1 file changed

+18
-22
lines changed

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

Lines changed: 18 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -84,15 +84,15 @@ def NVVM_Dialect : Dialect {
8484
**Memory Spaces:** The NVVM dialect introduces the following memory spaces,
8585
each with distinct scopes and lifetimes:
8686

87-
| Memory Space | Scope | Lifetime |
88-
|-------------------|----------------------|-------------------|
89-
| `generic` | All threads | Context-dependent |
90-
| `global` | All threads (device) | Application |
91-
| `shared` | Thread block (CTA) | Kernel execution |
92-
| `constant` | All threads (RO) | Application |
93-
| `local` | Single thread | Kernel execution |
94-
| `tensor` | Thread block (CTA) | Kernel execution |
95-
| `shared_cluster` | Thread block cluster | Kernel execution |
87+
| Memory Space | Address Space | Scope | Lifetime |
88+
|-------------------|---------------|----------------------|-------------------|
89+
| `generic` | 0 | All threads | Context-dependent |
90+
| `global` | 1 | All threads (device) | Application |
91+
| `shared` | 3 | Thread block (CTA) | Kernel execution |
92+
| `constant` | 4 | All threads (RO) | Application |
93+
| `local` | 5 | Single thread | Kernel execution |
94+
| `tensor` | 6 | Thread block (CTA) | Kernel execution |
95+
| `shared_cluster` | 7 | Thread block cluster | Kernel execution |
9696

9797
**Memory Space Details:**
9898
- **generic**: Can point to any memory space; requires runtime resolution of
@@ -104,19 +104,15 @@ def NVVM_Dialect : Dialect {
104104
- **shared**: Shared within a thread block (CTA); very fast on-chip memory for
105105
cooperation between threads in the same block. Limited capacity. Ideal for
106106
block-level collaboration, caching, and reducing global memory traffic.
107-
- **constant**: Read-only memory cached per SM; optimized for broadcast
108-
patterns where all threads access the same location. Fast access when cached.
109-
Size typically limited to 64KB. Best for read-only data and uniform values
110-
accessed by all threads.
111-
- **local**: Private to each thread; used for stack frames and register spills.
112-
Actually resides in global memory but cached in L1. Use for per-thread
113-
private data and automatic variables that don't fit in registers.
114-
- **tensor**: Special memory space for Tensor Memory Accelerator (TMA)
115-
operations on SM 80+ architectures; used with async tensor operations and
116-
wgmma instructions. Provides very fast access for matrix operations.
117-
- **shared_cluster**: Shared across thread blocks within a cluster (SM 90+);
118-
enables collaboration beyond single-block scope with distributed shared
119-
memory. Fast access across cluster threads.
107+
- **constant**: Read-only memory cached per SM. Size typically limited to
108+
64KB. Best for read-only data and uniform values accessed by all threads.
109+
- **local**: Private to each thread. Use for per-thread private data and
110+
automatic variables that don't fit in registers.
111+
- **tensor**: Special memory space for tensor core operations. Used by
112+
`tcgen05` instructions on SM 100+ for tensor input/output operations.
113+
- **shared_cluster**: Distributed shared memory across thread blocks within
114+
a cluster (SM 90+). Enables collaboration beyond single-block scope with
115+
fast access across cluster threads.
120116
}];
121117

122118
let name = "nvvm";

0 commit comments

Comments
 (0)