Skip to content

Commit 4cbf195

Browse files
Update docs
1 parent ce76595 commit 4cbf195

File tree

163 files changed

+2018
-57
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+2018
-57
lines changed

_sources/autoapi/tilelang/contrib/nvcc/index.rst.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Functions
3030
tilelang.contrib.nvcc.have_bf16
3131
tilelang.contrib.nvcc.have_fp8
3232
tilelang.contrib.nvcc.have_tma
33+
tilelang.contrib.nvcc.is_hopper
3334
tilelang.contrib.nvcc.get_nvcc_compiler
3435

3536

@@ -176,6 +177,8 @@ Module Contents
176177
:type target: tvm.target.Target
177178

178179

180+
.. py:function:: is_hopper(target)
181+
179182
.. py:function:: get_nvcc_compiler()
180183
181184
Get the path to the nvcc compiler

_sources/autoapi/tilelang/language/allocate/index.rst.txt

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Functions
3232
tilelang.language.allocate.alloc_fragment
3333
tilelang.language.allocate.alloc_var
3434
tilelang.language.allocate.alloc_barrier
35+
tilelang.language.allocate.alloc_tmem
3536
tilelang.language.allocate.alloc_reducer
3637

3738

@@ -107,6 +108,32 @@ Module Contents
107108
:rtype: T.Buffer
108109

109110

111+
.. py:function:: alloc_tmem(shape, dtype)
112+
113+
Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., UMMA).
114+
115+
TMEM is a dedicated on-chip memory introduced in Hopper GPUs, designed to reduce register pressure and enable asynchronous, single-threaded MMA operations. It is organized as a 2D array of 512 columns by 128 rows (lanes), with each cell being 32 bits. Allocation is performed in units of columns, and every lane of a column is allocated together.
116+
117+
Key properties and requirements:
118+
- The number of columns allocated must be a power of 2 and at least 32.
119+
- TMEM allocations are dynamic and must be explicitly deallocated.
120+
- Both allocation and deallocation must be performed by the same warp.
121+
- The base address of the TMEM allocation is stored in shared memory and used as the offset for UMMA accumulator tensors.
122+
- Only UMMA and specific TMEM load/store instructions can access TMEM; all pre-processing must occur before data is loaded into TMEM, and all post-processing after data is retrieved.
123+
- The number of columns allocated should not increase between any two allocations in the execution order within the CTA.
124+
125+
:param num_cols: Number of columns to allocate in TMEM. Must be a power of 2 and >= 32 but less than or equal to 512.
126+
:type num_cols: int
127+
128+
:returns: A TVM buffer object allocated in TMEM scope, suitable for use as an accumulator or operand in UMMA operations.
129+
:rtype: T.Buffer
130+
131+
.. note::
132+
133+
- TMEM is only available on supported architectures (e.g., Hopper and later).
134+
- The buffer returned should be used according to TMEM access restrictions and deallocated appropriately.
135+
136+
110137
.. py:function:: alloc_reducer(shape, dtype, op='sum', replication=None)
111138
112139
Allocate a reducer buffer.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
tilelang.language.fastmath
2+
==========================
3+
4+
.. py:module:: tilelang.language.fastmath
5+
6+

_sources/autoapi/tilelang/language/gemm/index.rst.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Functions
2121
Module Contents
2222
---------------
2323

24-
.. py:function:: gemm(A, B, C, transpose_A = False, transpose_B = False, policy = GemmWarpPolicy.Square, clear_accum = False, k_pack = 1, wg_wait = 0)
24+
.. py:function:: gemm(A, B, C, transpose_A = False, transpose_B = False, policy = GemmWarpPolicy.Square, clear_accum = False, k_pack = 1, wg_wait = 0, mbar = None)
2525
2626
Perform a General Matrix Multiplication (GEMM) operation.
2727

@@ -45,7 +45,11 @@ Module Contents
4545
:param k_pack: Number of k dimensions packed into a single warp. Defaults to 1.
4646
:type k_pack: int, optional
4747
:param wg_wait: Warp group wait count. Defaults to 0.
48+
On hopper it is equivalent to `wgmma.wait_group.sync.aligned <wg_wait>` if wg_wait is not -1
49+
On sm100 (datacenter blackwell), `wg_wait` can only be 0 or -1. `mbarrier_wait(UTCMMA barrier)` will be appended if wg_wait is 0.
4850
:type wg_wait: int, optional
51+
:param mbar: mbarrier for UTCMMA synchronization
52+
:type mbar: tir.Buffer, optional
4953

5054
:returns: A handle to the GEMM operation
5155
:rtype: tir.Call

_sources/autoapi/tilelang/language/index.rst.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,13 @@ Submodules
2121
/autoapi/tilelang/language/copy/index
2222
/autoapi/tilelang/language/customize/index
2323
/autoapi/tilelang/language/experimental/index
24+
/autoapi/tilelang/language/fastmath/index
2425
/autoapi/tilelang/language/fill/index
2526
/autoapi/tilelang/language/frame/index
2627
/autoapi/tilelang/language/gemm/index
2728
/autoapi/tilelang/language/kernel/index
2829
/autoapi/tilelang/language/logical/index
30+
/autoapi/tilelang/language/math_intrinsics/index
2931
/autoapi/tilelang/language/parallel/index
3032
/autoapi/tilelang/language/persistent/index
3133
/autoapi/tilelang/language/pipeline/index
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
tilelang.language.math_intrinsics
2+
=================================
3+
4+
.. py:module:: tilelang.language.math_intrinsics
5+
6+
7+
Functions
8+
---------
9+
10+
.. autoapisummary::
11+
12+
tilelang.language.math_intrinsics.ieee_add
13+
tilelang.language.math_intrinsics.ieee_sub
14+
tilelang.language.math_intrinsics.ieee_mul
15+
tilelang.language.math_intrinsics.ieee_fmaf
16+
tilelang.language.math_intrinsics.ieee_frcp
17+
tilelang.language.math_intrinsics.ieee_fsqrt
18+
tilelang.language.math_intrinsics.ieee_frsqrt
19+
tilelang.language.math_intrinsics.ieee_fdiv
20+
21+
22+
Module Contents
23+
---------------
24+
25+
.. py:function:: ieee_add(x, y, rounding_mode='rn')
26+
27+
IEEE-compliant addition with specified rounding mode
28+
29+
:param x: First operand.
30+
:type x: PrimExpr
31+
:param y: Second operand.
32+
:type y: PrimExpr
33+
:param rounding_mode: Rounding mode: 'rn' (round to nearest), 'rz' (round toward zero),
34+
'ru' (round toward positive infinity), 'rd' (round toward negative infinity).
35+
Default is 'rn'.
36+
:type rounding_mode: str, optional
37+
38+
:returns: **result** -- The result.
39+
:rtype: PrimExpr
40+
41+
42+
.. py:function:: ieee_sub(x, y, rounding_mode='rn')
43+
44+
IEEE-compliant subtraction with specified rounding mode
45+
46+
:param x: First operand.
47+
:type x: PrimExpr
48+
:param y: Second operand.
49+
:type y: PrimExpr
50+
:param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'.
51+
:type rounding_mode: str, optional
52+
53+
:returns: **result** -- The result.
54+
:rtype: PrimExpr
55+
56+
57+
.. py:function:: ieee_mul(x, y, rounding_mode='rn')
58+
59+
IEEE-compliant multiplication with specified rounding mode
60+
61+
:param x: First operand.
62+
:type x: PrimExpr
63+
:param y: Second operand.
64+
:type y: PrimExpr
65+
:param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'.
66+
:type rounding_mode: str, optional
67+
68+
:returns: **result** -- The result.
69+
:rtype: PrimExpr
70+
71+
72+
.. py:function:: ieee_fmaf(x, y, z, rounding_mode='rn')
73+
74+
IEEE-compliant fused multiply-add with specified rounding mode
75+
76+
:param x: First operand.
77+
:type x: PrimExpr
78+
:param y: Second operand.
79+
:type y: PrimExpr
80+
:param z: Third operand (addend).
81+
:type z: PrimExpr
82+
:param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'.
83+
:type rounding_mode: str, optional
84+
85+
:returns: **result** -- The result of x * y + z.
86+
:rtype: PrimExpr
87+
88+
89+
.. py:function:: ieee_frcp(x, rounding_mode='rn')
90+
91+
IEEE-compliant reciprocal with specified rounding mode
92+
93+
:param x: Input operand.
94+
:type x: PrimExpr
95+
:param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'.
96+
:type rounding_mode: str, optional
97+
98+
:returns: **result** -- The result of 1/x.
99+
:rtype: PrimExpr
100+
101+
102+
.. py:function:: ieee_fsqrt(x, rounding_mode='rn')
103+
104+
IEEE-compliant square root with specified rounding mode
105+
106+
:param x: Input operand.
107+
:type x: PrimExpr
108+
:param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'.
109+
:type rounding_mode: str, optional
110+
111+
:returns: **result** -- The result of sqrt(x).
112+
:rtype: PrimExpr
113+
114+
115+
.. py:function:: ieee_frsqrt(x)
116+
117+
IEEE-compliant reciprocal square root (round to nearest only)
118+
119+
:param x: Input operand.
120+
:type x: PrimExpr
121+
122+
:returns: **result** -- The result of 1/sqrt(x).
123+
:rtype: PrimExpr
124+
125+
126+
.. py:function:: ieee_fdiv(x, y, rounding_mode='rn')
127+
128+
IEEE-compliant division with specified rounding mode
129+
130+
:param x: Dividend.
131+
:type x: PrimExpr
132+
:param y: Divisor.
133+
:type y: PrimExpr
134+
:param rounding_mode: Rounding mode: 'rn', 'rz', 'ru', 'rd'. Default is 'rn'.
135+
:type rounding_mode: str, optional
136+
137+
:returns: **result** -- The result of x/y.
138+
:rtype: PrimExpr
139+
140+

_sources/autoapi/tilelang/transform/index.rst.txt

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Functions
3131
tilelang.transform.LayoutInference
3232
tilelang.transform.LowerTileOp
3333
tilelang.transform.InjectSoftwarePipeline
34+
tilelang.transform.FrontendLegalize
3435
tilelang.transform.InjectAssumes
3536
tilelang.transform.LowerHopperIntrin
3637
tilelang.transform.WarpSpecializedPipeline
@@ -64,6 +65,7 @@ Functions
6465
tilelang.transform.LowerOpaqueBlock
6566
tilelang.transform.LowerThreadAllreduce
6667
tilelang.transform.LowerDeviceKernelLaunch
68+
tilelang.transform.LowerSharedTmem
6769
tilelang.transform.LayoutReducer
6870

6971

@@ -115,6 +117,14 @@ Package Contents
115117
:rtype: tvm.transform.Pass
116118

117119

120+
.. py:function:: FrontendLegalize()
121+
122+
FrontendLegalize
123+
124+
:returns: **fpass** -- The result pass
125+
:rtype: tvm.transform.Pass
126+
127+
118128
.. py:function:: InjectAssumes()
119129
120130
Inject Assumes
@@ -384,6 +394,12 @@ Package Contents
384394
:rtype: tvm.transform.Pass
385395

386396

397+
.. py:function:: LowerSharedTmem()
398+
399+
LowerSharedTmem
400+
401+
402+
387403
.. py:function:: LayoutReducer()
388404
389405
Return a TVM transform pass that performs layout reduction/normalization.

_sources/autoapi/tilelang/transform/pass_config/index.rst.txt

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,24 @@ Module Contents
123123
:type: Disable safe memory access optimization. Default
124124

125125

126+
.. py:attribute:: TL_DISABLE_VECTORIZE_256
127+
:value: 'tl.disable_vectorize_256'
128+
129+
130+
False
131+
132+
:type: Disable usage of LDG/STG 256. Default
133+
134+
135+
.. py:attribute:: TL_DISABLE_WGMMA
136+
:value: 'tl.disable_wgmma'
137+
138+
139+
False
140+
141+
:type: Disable usage of Hopper WGMMA. Default
142+
143+
126144
.. py:attribute:: TL_DEBUG_MERGE_SHARED_MEMORY_ALLOCATIONS
127145
:value: 'tl.debug_merge_shared_memory_allocations'
128146

autoapi/index.html

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,11 +340,13 @@
340340
<li class="toctree-l4"><a class="reference internal" href="tilelang/language/experimental/gemm_sp/index.html">tilelang.language.experimental.gemm_sp</a></li>
341341
</ul>
342342
</li>
343+
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/fastmath/index.html">tilelang.language.fastmath</a></li>
343344
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/fill/index.html">tilelang.language.fill</a></li>
344345
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/frame/index.html">tilelang.language.frame</a></li>
345346
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/gemm/index.html">tilelang.language.gemm</a></li>
346347
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/kernel/index.html">tilelang.language.kernel</a></li>
347348
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/logical/index.html">tilelang.language.logical</a></li>
349+
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/math_intrinsics/index.html">tilelang.language.math_intrinsics</a></li>
348350
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/parallel/index.html">tilelang.language.parallel</a></li>
349351
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/persistent/index.html">tilelang.language.persistent</a></li>
350352
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/pipeline/index.html">tilelang.language.pipeline</a></li>
@@ -588,11 +590,13 @@ <h1>API Reference<a class="headerlink" href="#api-reference" title="Link to this
588590
<li class="toctree-l4"><a class="reference internal" href="tilelang/language/experimental/gemm_sp/index.html">tilelang.language.experimental.gemm_sp</a></li>
589591
</ul>
590592
</li>
593+
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/fastmath/index.html">tilelang.language.fastmath</a></li>
591594
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/fill/index.html">tilelang.language.fill</a></li>
592595
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/frame/index.html">tilelang.language.frame</a></li>
593596
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/gemm/index.html">tilelang.language.gemm</a></li>
594597
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/kernel/index.html">tilelang.language.kernel</a></li>
595598
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/logical/index.html">tilelang.language.logical</a></li>
599+
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/math_intrinsics/index.html">tilelang.language.math_intrinsics</a></li>
596600
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/parallel/index.html">tilelang.language.parallel</a></li>
597601
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/persistent/index.html">tilelang.language.persistent</a></li>
598602
<li class="toctree-l3"><a class="reference internal" href="tilelang/language/pipeline/index.html">tilelang.language.pipeline</a></li>

autoapi/tilelang/autotuner/capture/index.html

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,11 +340,13 @@
340340
<li class="toctree-l4"><a class="reference internal" href="../../language/experimental/gemm_sp/index.html">tilelang.language.experimental.gemm_sp</a></li>
341341
</ul>
342342
</li>
343+
<li class="toctree-l3"><a class="reference internal" href="../../language/fastmath/index.html">tilelang.language.fastmath</a></li>
343344
<li class="toctree-l3"><a class="reference internal" href="../../language/fill/index.html">tilelang.language.fill</a></li>
344345
<li class="toctree-l3"><a class="reference internal" href="../../language/frame/index.html">tilelang.language.frame</a></li>
345346
<li class="toctree-l3"><a class="reference internal" href="../../language/gemm/index.html">tilelang.language.gemm</a></li>
346347
<li class="toctree-l3"><a class="reference internal" href="../../language/kernel/index.html">tilelang.language.kernel</a></li>
347348
<li class="toctree-l3"><a class="reference internal" href="../../language/logical/index.html">tilelang.language.logical</a></li>
349+
<li class="toctree-l3"><a class="reference internal" href="../../language/math_intrinsics/index.html">tilelang.language.math_intrinsics</a></li>
348350
<li class="toctree-l3"><a class="reference internal" href="../../language/parallel/index.html">tilelang.language.parallel</a></li>
349351
<li class="toctree-l3"><a class="reference internal" href="../../language/persistent/index.html">tilelang.language.persistent</a></li>
350352
<li class="toctree-l3"><a class="reference internal" href="../../language/pipeline/index.html">tilelang.language.pipeline</a></li>

0 commit comments

Comments
 (0)