Skip to content

Commit 390fc59

Browse files
Update docs
1 parent 4cbf195 commit 390fc59

File tree

5 files changed

+14
-14
lines changed

5 files changed

+14
-14
lines changed

_sources/autoapi/tilelang/language/allocate/index.rst.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,22 +110,22 @@ Module Contents
110110

111111
.. py:function:: alloc_tmem(shape, dtype)
112112
113-
Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., UMMA).
113+
Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., TCGEN5.MMA).
114114

115115
TMEM is a dedicated on-chip memory introduced in Hopper GPUs, designed to reduce register pressure and enable asynchronous, single-threaded MMA operations. It is organized as a 2D array of 512 columns by 128 rows (lanes), with each cell being 32 bits. Allocation is performed in units of columns, and every lane of a column is allocated together.
116116

117117
Key properties and requirements:
118118
- The number of columns allocated must be a power of 2 and at least 32.
119119
- TMEM allocations are dynamic and must be explicitly deallocated.
120120
- Both allocation and deallocation must be performed by the same warp.
121-
- The base address of the TMEM allocation is stored in shared memory and used as the offset for UMMA accumulator tensors.
122-
- Only UMMA and specific TMEM load/store instructions can access TMEM; all pre-processing must occur before data is loaded into TMEM, and all post-processing after data is retrieved.
121+
- The base address of the TMEM allocation is stored in shared memory and used as the offset for TCGEN5.MMA accumulator tensors.
122+
- Only TCGEN5.MMA and specific TMEM load/store instructions can access TMEM; all pre-processing must occur before data is loaded into TMEM, and all post-processing after data is retrieved.
123123
- The number of columns allocated should not increase between any two allocations in the execution order within the CTA.
124124

125125
:param num_cols: Number of columns to allocate in TMEM. Must be a power of 2 and >= 32 but less than or equal to 512.
126126
:type num_cols: int
127127

128-
:returns: A TVM buffer object allocated in TMEM scope, suitable for use as an accumulator or operand in UMMA operations.
128+
:returns: A TVM buffer object allocated in TMEM scope, suitable for use as an accumulator or operand in TCGEN5.MMA operations.
129129
:rtype: T.Buffer
130130

131131
.. note::

_sources/autoapi/tilelang/language/gemm/index.rst.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,9 @@ Module Contents
4646
:type k_pack: int, optional
4747
:param wg_wait: Warp group wait count. Defaults to 0.
4848
On hopper it is equivalent to `wgmma.wait_group.sync.aligned <wg_wait>` if wg_wait is not -1
49-
On sm100 (datacenter blackwell), `wg_wait` can only be 0 or -1. `mbarrier_wait(UTCMMA barrier)` will be appended if wg_wait is 0.
49+
On sm100, `wg_wait` can only be 0 or -1. `mbarrier_wait(TCGEN5MMA barrier)` will be appended if wg_wait is 0.
5050
:type wg_wait: int, optional
51-
:param mbar: mbarrier for UTCMMA synchronization
51+
:param mbar: mbarrier for TCGEN5MMA synchronization
5252
:type mbar: tir.Buffer, optional
5353

5454
:returns: A handle to the GEMM operation

autoapi/tilelang/language/allocate/index.html

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -500,7 +500,7 @@ <h2>Functions<a class="headerlink" href="#functions" title="Link to this heading
500500
<td><p>Allocate a barrier buffer.</p></td>
501501
</tr>
502502
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.language.allocate.alloc_tmem" title="tilelang.language.allocate.alloc_tmem"><code class="xref py py-obj docutils literal notranslate"><span class="pre">alloc_tmem</span></code></a>(shape, dtype)</p></td>
503-
<td><p>Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., UMMA).</p></td>
503+
<td><p>Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., TCGEN5.MMA).</p></td>
504504
</tr>
505505
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.language.allocate.alloc_reducer" title="tilelang.language.allocate.alloc_reducer"><code class="xref py py-obj docutils literal notranslate"><span class="pre">alloc_reducer</span></code></a>(shape, dtype[, op, replication])</p></td>
506506
<td><p>Allocate a reducer buffer.</p></td>
@@ -614,15 +614,15 @@ <h2>Module Contents<a class="headerlink" href="#module-contents" title="Link to
614614
<dl class="py function">
615615
<dt class="sig sig-object py" id="tilelang.language.allocate.alloc_tmem">
616616
<span class="sig-prename descclassname"><span class="pre">tilelang.language.allocate.</span></span><span class="sig-name descname"><span class="pre">alloc_tmem</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">shape</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dtype</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#tilelang.language.allocate.alloc_tmem" title="Link to this definition"></a></dt>
617-
<dd><p>Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., UMMA).</p>
617+
<dd><p>Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., TCGEN5.MMA).</p>
618618
<p>TMEM is a dedicated on-chip memory introduced in Hopper GPUs, designed to reduce register pressure and enable asynchronous, single-threaded MMA operations. It is organized as a 2D array of 512 columns by 128 rows (lanes), with each cell being 32 bits. Allocation is performed in units of columns, and every lane of a column is allocated together.</p>
619619
<dl class="simple">
620620
<dt>Key properties and requirements:</dt><dd><ul class="simple">
621621
<li><p>The number of columns allocated must be a power of 2 and at least 32.</p></li>
622622
<li><p>TMEM allocations are dynamic and must be explicitly deallocated.</p></li>
623623
<li><p>Both allocation and deallocation must be performed by the same warp.</p></li>
624-
<li><p>The base address of the TMEM allocation is stored in shared memory and used as the offset for UMMA accumulator tensors.</p></li>
625-
<li><p>Only UMMA and specific TMEM load/store instructions can access TMEM; all pre-processing must occur before data is loaded into TMEM, and all post-processing after data is retrieved.</p></li>
624+
<li><p>The base address of the TMEM allocation is stored in shared memory and used as the offset for TCGEN5.MMA accumulator tensors.</p></li>
625+
<li><p>Only TCGEN5.MMA and specific TMEM load/store instructions can access TMEM; all pre-processing must occur before data is loaded into TMEM, and all post-processing after data is retrieved.</p></li>
626626
<li><p>The number of columns allocated should not increase between any two allocations in the execution order within the CTA.</p></li>
627627
</ul>
628628
</dd>
@@ -632,7 +632,7 @@ <h2>Module Contents<a class="headerlink" href="#module-contents" title="Link to
632632
<dd class="field-odd"><p><strong>num_cols</strong> (<em>int</em>) – Number of columns to allocate in TMEM. Must be a power of 2 and &gt;= 32 but less than or equal to 512.</p>
633633
</dd>
634634
<dt class="field-even">Returns<span class="colon">:</span></dt>
635-
<dd class="field-even"><p>A TVM buffer object allocated in TMEM scope, suitable for use as an accumulator or operand in UMMA operations.</p>
635+
<dd class="field-even"><p>A TVM buffer object allocated in TMEM scope, suitable for use as an accumulator or operand in TCGEN5.MMA operations.</p>
636636
</dd>
637637
<dt class="field-odd">Return type<span class="colon">:</span></dt>
638638
<dd class="field-odd"><p>T.Buffer</p>

autoapi/tilelang/language/gemm/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -501,8 +501,8 @@ <h2>Module Contents<a class="headerlink" href="#module-contents" title="Link to
501501
<li><p><strong>k_pack</strong> (<em>int</em><em>, </em><em>optional</em>) – Number of k dimensions packed into a single warp. Defaults to 1.</p></li>
502502
<li><p><strong>wg_wait</strong> (<em>int</em><em>, </em><em>optional</em>) – Warp group wait count. Defaults to 0.
503503
On hopper it is equivalent to <cite>wgmma.wait_group.sync.aligned &lt;wg_wait&gt;</cite> if wg_wait is not -1
504-
On sm100 (datacenter blackwell), <cite>wg_wait</cite> can only be 0 or -1. <cite>mbarrier_wait(UTCMMA barrier)</cite> will be appended if wg_wait is 0.</p></li>
505-
<li><p><strong>mbar</strong> (<em>tir.Buffer</em><em>, </em><em>optional</em>) – mbarrier for UTCMMA synchronization</p></li>
504+
On sm100, <cite>wg_wait</cite> can only be 0 or -1. <cite>mbarrier_wait(TCGEN5MMA barrier)</cite> will be appended if wg_wait is 0.</p></li>
505+
<li><p><strong>mbar</strong> (<em>tir.Buffer</em><em>, </em><em>optional</em>) – mbarrier for TCGEN5MMA synchronization</p></li>
506506
</ul>
507507
</dd>
508508
<dt class="field-even">Returns<span class="colon">:</span></dt>

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)