Skip to content

Commit 51263d0

Browse files
Update docs
1 parent ec350f6 commit 51263d0

File tree

163 files changed

+5952
-648
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+5952
-648
lines changed

_sources/autoapi/tilelang/index.rst.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Submodules
2727
/autoapi/tilelang/profiler/index
2828
/autoapi/tilelang/quantize/index
2929
/autoapi/tilelang/testing/index
30+
/autoapi/tilelang/tileop/index
3031
/autoapi/tilelang/tools/index
3132
/autoapi/tilelang/transform/index
3233
/autoapi/tilelang/utils/index

_sources/autoapi/tilelang/intrinsics/mma_layout/index.rst.txt

Lines changed: 84 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,27 +9,49 @@ Attributes
99

1010
.. autoapisummary::
1111

12-
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_layout
13-
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_layout_trans
12+
tilelang.intrinsics.mma_layout.shared_16x8_to_mma_32x4_layout_sr_a
13+
tilelang.intrinsics.mma_layout.shared_16x8_to_mma_32x4_layout_sr_b
14+
tilelang.intrinsics.mma_layout.shared_16x8_to_mma_32x4_layout_rs_a
15+
tilelang.intrinsics.mma_layout.shared_16x8_to_mma_32x4_layout_rs_b
16+
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_layout_sr_a
17+
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_layout_sr_b
18+
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_layout_rs_a
19+
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_layout_rs_b
20+
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_32x16_layout_sr_a
21+
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_32x16_layout_sr_b
22+
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_32x16_layout_rs_a
23+
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_32x16_layout_rs_b
1424

1525

1626
Functions
1727
---------
1828

1929
.. autoapisummary::
2030

31+
tilelang.intrinsics.mma_layout.ldmatrix_32x4_to_shared_16x8_layout_a
32+
tilelang.intrinsics.mma_layout.ldmatrix_32x4_to_shared_16x8_layout_b
2133
tilelang.intrinsics.mma_layout.ldmatrix_32x8_to_shared_16x16_layout
2234
tilelang.intrinsics.mma_layout.ldmatrix_trans_32x8_to_shared_16x16_layout
23-
tilelang.intrinsics.mma_layout.ldmatrix_16x32_to_shared_16x32_layout_a
24-
tilelang.intrinsics.mma_layout.ldmatrix_16x32_to_shared_16x32_layout_b
2535
tilelang.intrinsics.mma_layout.ldmatrix_32x16_to_shared_16x32_layout_a
2636
tilelang.intrinsics.mma_layout.ldmatrix_32x16_to_shared_16x32_layout_b
2737
tilelang.intrinsics.mma_layout.mma_store_32x8_to_shared_16x16_layout
28-
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_layout_sr
29-
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_layout_rs
30-
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_32x16_layout
31-
tilelang.intrinsics.mma_layout.shared_32x16_to_mma_32x16_layout
38+
tilelang.intrinsics.mma_layout.shared_16x8_to_mma_a_32x4_layout
39+
tilelang.intrinsics.mma_layout.shared_16x8_to_mma_a_32x4_layout_trans
40+
tilelang.intrinsics.mma_layout.shared_16x8_to_mma_b_32x4_layout
41+
tilelang.intrinsics.mma_layout.shared_16x8_to_mma_b_32x4_layout_trans
42+
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_a_32x8_layout
43+
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_a_32x8_layout_trans
44+
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_b_32x8_layout
45+
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_b_32x8_layout_trans
46+
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_a_32x16_layout
47+
tilelang.intrinsics.mma_layout.shared_32x16_to_mma_a_32x16_layout_trans
48+
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_b_32x16_layout
49+
tilelang.intrinsics.mma_layout.shared_32x16_to_mma_b_32x16_layout_trans
3250
tilelang.intrinsics.mma_layout.mma_32x8_to_shared_16x16_layout
51+
tilelang.intrinsics.mma_layout.mma_load_a_32x4_to_shared_16x8_layout
52+
tilelang.intrinsics.mma_layout.mma_load_b_32x4_to_shared_16x8_layout
53+
tilelang.intrinsics.mma_layout.mma_load_a_32x16_to_shared_16x32_layout
54+
tilelang.intrinsics.mma_layout.mma_load_b_32x16_to_shared_16x32_layout
3355
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_smoothlayout
3456
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_32x16_smoothlayout
3557
tilelang.intrinsics.mma_layout.shared_32x16_to_mma_32x16_smoothlayout
@@ -40,34 +62,78 @@ Functions
4062
Module Contents
4163
---------------
4264

43-
.. py:function:: ldmatrix_32x8_to_shared_16x16_layout(thread_id, local_id)
65+
.. py:function:: ldmatrix_32x4_to_shared_16x8_layout_a(thread_id, local_id)
4466
45-
.. py:function:: ldmatrix_trans_32x8_to_shared_16x16_layout(thread_id, local_id)
67+
.. py:function:: ldmatrix_32x4_to_shared_16x8_layout_b(thread_id, local_id)
4668
47-
.. py:function:: ldmatrix_16x32_to_shared_16x32_layout_a(thread_id, local_id)
69+
.. py:function:: ldmatrix_32x8_to_shared_16x16_layout(thread_id, local_id)
4870
49-
.. py:function:: ldmatrix_16x32_to_shared_16x32_layout_b(thread_id, local_id)
71+
.. py:function:: ldmatrix_trans_32x8_to_shared_16x16_layout(thread_id, local_id)
5072
5173
.. py:function:: ldmatrix_32x16_to_shared_16x32_layout_a(thread_id, local_id)
5274
5375
.. py:function:: ldmatrix_32x16_to_shared_16x32_layout_b(thread_id, local_id)
5476
5577
.. py:function:: mma_store_32x8_to_shared_16x16_layout(thread_id, local_id)
5678
57-
.. py:function:: shared_16x16_to_mma_32x8_layout_sr(i, j)
79+
.. py:function:: shared_16x8_to_mma_a_32x4_layout(i, j)
80+
81+
.. py:function:: shared_16x8_to_mma_a_32x4_layout_trans(i, j)
82+
83+
.. py:function:: shared_16x8_to_mma_b_32x4_layout(i, j)
84+
85+
.. py:function:: shared_16x8_to_mma_b_32x4_layout_trans(i, j)
86+
87+
.. py:data:: shared_16x8_to_mma_32x4_layout_sr_a
88+
89+
.. py:data:: shared_16x8_to_mma_32x4_layout_sr_b
90+
91+
.. py:data:: shared_16x8_to_mma_32x4_layout_rs_a
92+
93+
.. py:data:: shared_16x8_to_mma_32x4_layout_rs_b
94+
95+
.. py:function:: shared_16x16_to_mma_a_32x8_layout(i, j)
96+
97+
.. py:function:: shared_16x16_to_mma_a_32x8_layout_trans(i, j)
5898
59-
.. py:function:: shared_16x16_to_mma_32x8_layout_rs(i, j)
99+
.. py:function:: shared_16x16_to_mma_b_32x8_layout(i, j)
60100
61-
.. py:data:: shared_16x16_to_mma_32x8_layout
101+
.. py:function:: shared_16x16_to_mma_b_32x8_layout_trans(i, j)
62102
63-
.. py:data:: shared_16x16_to_mma_32x8_layout_trans
103+
.. py:data:: shared_16x16_to_mma_32x8_layout_sr_a
64104
65-
.. py:function:: shared_16x32_to_mma_32x16_layout(i, j)
105+
.. py:data:: shared_16x16_to_mma_32x8_layout_sr_b
66106
67-
.. py:function:: shared_32x16_to_mma_32x16_layout(i, j)
107+
.. py:data:: shared_16x16_to_mma_32x8_layout_rs_a
108+
109+
.. py:data:: shared_16x16_to_mma_32x8_layout_rs_b
110+
111+
.. py:function:: shared_16x32_to_mma_a_32x16_layout(i, j)
112+
113+
.. py:function:: shared_32x16_to_mma_a_32x16_layout_trans(i, j)
114+
115+
.. py:function:: shared_16x32_to_mma_b_32x16_layout(i, j)
116+
117+
.. py:function:: shared_32x16_to_mma_b_32x16_layout_trans(i, j)
118+
119+
.. py:data:: shared_16x32_to_mma_32x16_layout_sr_a
120+
121+
.. py:data:: shared_16x32_to_mma_32x16_layout_sr_b
122+
123+
.. py:data:: shared_16x32_to_mma_32x16_layout_rs_a
124+
125+
.. py:data:: shared_16x32_to_mma_32x16_layout_rs_b
68126
69127
.. py:function:: mma_32x8_to_shared_16x16_layout(thread_id, local_id)
70128
129+
.. py:function:: mma_load_a_32x4_to_shared_16x8_layout(thread_id, local_id)
130+
131+
.. py:function:: mma_load_b_32x4_to_shared_16x8_layout(thread_id, local_id)
132+
133+
.. py:function:: mma_load_a_32x16_to_shared_16x32_layout(thread_id, local_id)
134+
135+
.. py:function:: mma_load_b_32x16_to_shared_16x32_layout(thread_id, local_id)
136+
71137
.. py:function:: shared_16x16_to_mma_32x8_smoothlayout(i, j)
72138
73139
.. py:function:: shared_16x32_to_mma_32x16_smoothlayout(i, j)

_sources/autoapi/tilelang/intrinsics/mma_macro_generator/index.rst.txt

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Module Contents
2828

2929
.. py:data:: lift
3030
31-
.. py:class:: TensorCoreIntrinEmitter(a_dtype = 'float16', b_dtype = 'float16', accum_dtype = 'float16', a_transposed = False, b_transposed = False, block_row_warps = 2, block_col_warps = 2, warp_row_tiles = 8, warp_col_tiles = 8, chunk = 16, reduce_k = 1, num_elems_per_byte = 1, is_m_first = False)
31+
.. py:class:: TensorCoreIntrinEmitter(a_dtype = 'float16', b_dtype = 'float16', accum_dtype = 'float16', a_transposed = False, b_transposed = False, block_row_warps = 2, block_col_warps = 2, warp_row_tiles = 8, warp_col_tiles = 8, chunk = 16, reduce_k = 1, num_elems_per_byte = 1, is_m_first = False, thread_var = None)
3232
3333
Bases: :py:obj:`object`
3434

@@ -41,7 +41,7 @@ Module Contents
4141

4242

4343

44-
.. py:attribute:: N_DIM
44+
.. py:attribute:: n_dim
4545
:value: 16
4646

4747

@@ -109,16 +109,6 @@ Module Contents
109109

110110

111111

112-
.. py:attribute:: warp_rows
113-
:value: 0
114-
115-
116-
117-
.. py:attribute:: warp_cols
118-
:value: 0
119-
120-
121-
122112
.. py:attribute:: reduce_k
123113
:value: 1
124114

@@ -134,6 +124,14 @@ Module Contents
134124

135125

136126

127+
.. py:attribute:: thread_var
128+
:value: None
129+
130+
131+
132+
.. py:method:: get_thread_binding()
133+
134+
137135
.. py:method:: get_store_index_map(inverse = False)
138136
139137
@@ -209,7 +207,7 @@ Module Contents
209207
.. py:method:: mma(A_local_buf, B_local_buf, C_local_buf)
210208
211209
212-
.. py:class:: INT4TensorCoreIntrinEmitter(a_dtype = 'float16', b_dtype = 'float16', accum_dtype = 'float16', a_transposed = False, b_transposed = False, block_row_warps = 2, block_col_warps = 2, warp_row_tiles = 8, warp_col_tiles = 8, chunk = 16, reduce_k = 1, num_elems_per_byte = 1, is_m_first = False)
210+
.. py:class:: INT4TensorCoreIntrinEmitter(a_dtype = 'float16', b_dtype = 'float16', accum_dtype = 'float16', a_transposed = False, b_transposed = False, block_row_warps = 2, block_col_warps = 2, warp_row_tiles = 8, warp_col_tiles = 8, chunk = 16, reduce_k = 1, num_elems_per_byte = 1, is_m_first = False, thread_var = None)
213211
214212
Bases: :py:obj:`TensorCoreIntrinEmitter`
215213

_sources/autoapi/tilelang/ir/index.rst.txt

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,21 @@ Module Contents
5252
Bases: :py:obj:`tvm.ir.base.Node`, :py:obj:`tvm.runtime.Scriptable`
5353

5454

55+
.. py:attribute:: policy_type
56+
:type: int
57+
58+
59+
.. py:attribute:: m_warp
60+
:type: int
61+
62+
63+
.. py:attribute:: n_warp
64+
:type: int
65+
66+
67+
.. py:method:: compute_warp_partition(M, N, block_size, target, is_wgmma)
68+
69+
5570
.. py:class:: Gemm
5671
5772
Bases: :py:obj:`tvm.ir.base.Node`, :py:obj:`tvm.runtime.Scriptable`

_sources/autoapi/tilelang/language/gemm/index.rst.txt

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Functions
1515
.. autoapisummary::
1616

1717
tilelang.language.gemm.gemm
18+
tilelang.language.gemm.gemm_v2
1819

1920

2021
Module Contents
@@ -52,3 +53,35 @@ Module Contents
5253
:raises AssertionError: If the K dimensions of matrices A and B don't match
5354

5455

56+
.. py:function:: gemm_v2(A, B, C, transpose_A = False, transpose_B = False, policy = GemmWarpPolicy.Square, clear_accum = False, k_pack = 1, wg_wait = 0)
57+
58+
Perform a General Matrix Multiplication (GEMM) operation.
59+
60+
This function computes C = A @ B where A and B can optionally be transposed.
61+
The operation supports various warp policies and accumulation modes.
62+
63+
:param A: First input matrix
64+
:type A: Union[tir.Buffer, tir.Var]
65+
:param B: Second input matrix
66+
:type B: Union[tir.Buffer, tir.Var]
67+
:param C: Output matrix for results
68+
:type C: Union[tir.Buffer, tir.Var]
69+
:param transpose_A: Whether to transpose matrix A. Defaults to False.
70+
:type transpose_A: bool, optional
71+
:param transpose_B: Whether to transpose matrix B. Defaults to False.
72+
:type transpose_B: bool, optional
73+
:param policy: Warp execution policy. Defaults to GemmWarpPolicy.Square.
74+
:type policy: GemmWarpPolicy, optional
75+
:param clear_accum: Whether to clear accumulator before computation. Defaults to False.
76+
:type clear_accum: bool, optional
77+
:param k_pack: Number of k dimensions packed into a single warp. Defaults to 1.
78+
:type k_pack: int, optional
79+
:param wg_wait: Warp group wait count. Defaults to 0.
80+
:type wg_wait: int, optional
81+
82+
:returns: A handle to the GEMM operation
83+
:rtype: tir.Call
84+
85+
:raises AssertionError: If the K dimensions of matrices A and B don't match
86+
87+

0 commit comments

Comments
 (0)