Skip to content

Commit 10509af

Browse files
Update docs
1 parent fe9ba15 commit 10509af

File tree

207 files changed

+8974
-612
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

207 files changed

+8974
-612
lines changed
285 KB
Loading

_sources/autoapi/tilelang/intrinsics/index.rst.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ Submodules
1616
/autoapi/tilelang/intrinsics/mma_macro_generator/index
1717
/autoapi/tilelang/intrinsics/mma_sm70_layout/index
1818
/autoapi/tilelang/intrinsics/mma_sm70_macro_generator/index
19+
/autoapi/tilelang/intrinsics/mma_sp_layout/index
20+
/autoapi/tilelang/intrinsics/mma_sp_macro_generator/index
1921
/autoapi/tilelang/intrinsics/tcgen05_macro_generator/index
2022
/autoapi/tilelang/intrinsics/utils/index
2123
/autoapi/tilelang/intrinsics/wgmma_macro_generator/index

_sources/autoapi/tilelang/intrinsics/mma_layout/index.rst.txt

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,9 @@ Functions
5252
tilelang.intrinsics.mma_layout.mma_load_a_32x4_to_shared_16x8_layout
5353
tilelang.intrinsics.mma_layout.mma_load_b_32x4_to_shared_16x8_layout
5454
tilelang.intrinsics.mma_layout.mma_load_a_32x16_to_shared_16x32_layout
55+
tilelang.intrinsics.mma_layout.mma_load_a_32x8_to_shared_16x16_layout
5556
tilelang.intrinsics.mma_layout.mma_load_b_32x16_to_shared_16x32_layout
57+
tilelang.intrinsics.mma_layout.mma_load_b_32x8_to_shared_16x16_layout
5658
tilelang.intrinsics.mma_layout.shared_16x16_to_mma_32x8_smoothlayout
5759
tilelang.intrinsics.mma_layout.shared_16x32_to_mma_32x16_smoothlayout
5860
tilelang.intrinsics.mma_layout.shared_32x16_to_mma_32x16_smoothlayout
@@ -135,8 +137,31 @@ Module Contents
135137
136138
.. py:function:: mma_load_a_32x16_to_shared_16x32_layout(thread_id, local_id)
137139
140+
.. py:function:: mma_load_a_32x8_to_shared_16x16_layout(thread_id, local_id)
141+
142+
groupID = %laneid >> 2
143+
threadID_in_group = %laneid % 4
144+
145+
row = groupID for ai where 0 <= i < 2 || 4 <= i < 6
146+
groupID + 8 Otherwise
147+
148+
col = (threadID_in_group * 2) + (i & 0x1) for ai where i < 4
149+
(threadID_in_group * 2) + (i & 0x1) + 8 for ai where i >= 4
150+
151+
138152
.. py:function:: mma_load_b_32x16_to_shared_16x32_layout(thread_id, local_id)
139153
154+
.. py:function:: mma_load_b_32x8_to_shared_16x16_layout(thread_id, local_id)
155+
156+
groupID = %laneid >> 2
157+
threadID_in_group = %laneid % 4
158+
159+
row = (threadID_in_group * 2) + (i & 0x1) for bi where i < 2
160+
(threadID_in_group * 2) + (i & 0x1) + 8 for bi where i >= 2
161+
162+
col = groupID
163+
164+
140165
.. py:function:: shared_16x16_to_mma_32x8_smoothlayout(i, j)
141166
142167
.. py:function:: shared_16x32_to_mma_32x16_smoothlayout(i, j)
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
tilelang.intrinsics.mma_sp_layout
2+
=================================
3+
4+
.. py:module:: tilelang.intrinsics.mma_sp_layout
5+
6+
7+
Functions
8+
---------
9+
10+
.. autoapisummary::
11+
12+
tilelang.intrinsics.mma_sp_layout.shared_16x16_to_mma_sp_layout_sr_a
13+
tilelang.intrinsics.mma_sp_layout.shared_16x16_to_mma_sp_layout_sr_b
14+
tilelang.intrinsics.mma_sp_layout.shared_16x32_to_mma_sp_layout_sr_a
15+
tilelang.intrinsics.mma_sp_layout.shared_16x32_to_mma_sp_layout_sr_b
16+
tilelang.intrinsics.mma_sp_layout.shared_16x64_to_mma_sp_layout_sr_a
17+
tilelang.intrinsics.mma_sp_layout.shared_16x64_to_mma_sp_layout_sr_b
18+
tilelang.intrinsics.mma_sp_layout.mma_sp_load_a_32x4_to_shared_16x16_layout
19+
tilelang.intrinsics.mma_sp_layout.mma_sp_load_a_32x8_to_shared_16x32_layout
20+
tilelang.intrinsics.mma_sp_layout.mma_sp_load_a_32x16_to_shared_16x64_layout
21+
tilelang.intrinsics.mma_sp_layout.mma_sp_load_b_32x8_to_shared_16x16_layout
22+
tilelang.intrinsics.mma_sp_layout.mma_sp_load_b_32x16_to_shared_16x32_layout
23+
tilelang.intrinsics.mma_sp_layout.mma_sp_load_b_32x32_to_shared_16x64_layout
24+
tilelang.intrinsics.mma_sp_layout.get_logical_id_32bit
25+
tilelang.intrinsics.mma_sp_layout.metadata_8bit_load_32x4_to_shared_16x4_layout_32bit
26+
tilelang.intrinsics.mma_sp_layout.metadata_16bit_load_32x2_to_shared_16x2_layout_32bit
27+
tilelang.intrinsics.mma_sp_layout.metadata_8bit_load_32x4_to_shared_16x4_layout_16bit
28+
tilelang.intrinsics.mma_sp_layout.metadata_16bit_load_32x2_to_shared_16x2_layout_16bit
29+
tilelang.intrinsics.mma_sp_layout.get_logical_id_8bit
30+
tilelang.intrinsics.mma_sp_layout.metadata_8bit_load_32x4_to_shared_16x4_layout_8bit
31+
tilelang.intrinsics.mma_sp_layout.metadata_16bit_load_32x2_to_shared_16x4_layout_8bit
32+
tilelang.intrinsics.mma_sp_layout.metadata_32bit_load_32x1_to_shared_16x2_layout_8bit
33+
tilelang.intrinsics.mma_sp_layout.ldmatrix_trans_32x8_to_shared_16x16_layout
34+
tilelang.intrinsics.mma_sp_layout.ldmatrix_32x16_to_shared_32x16_layout
35+
tilelang.intrinsics.mma_sp_layout.ldmatrix_trans_32x16_to_shared_16x32_layout
36+
tilelang.intrinsics.mma_sp_layout.ldmatrix_trans_32x32_to_shared_shared_16x64_layout
37+
tilelang.intrinsics.mma_sp_layout.get_ldmatrix_offset_b
38+
39+
40+
Module Contents
41+
---------------
42+
43+
.. py:function:: shared_16x16_to_mma_sp_layout_sr_a(i, j)
44+
45+
.. py:function:: shared_16x16_to_mma_sp_layout_sr_b(i, j)
46+
47+
.. py:function:: shared_16x32_to_mma_sp_layout_sr_a(i, j)
48+
49+
.. py:function:: shared_16x32_to_mma_sp_layout_sr_b(i, j)
50+
51+
.. py:function:: shared_16x64_to_mma_sp_layout_sr_a(i, j)
52+
53+
.. py:function:: shared_16x64_to_mma_sp_layout_sr_b(i, j)
54+
55+
.. py:function:: mma_sp_load_a_32x4_to_shared_16x16_layout(thread_id, local_id)
56+
57+
.. py:function:: mma_sp_load_a_32x8_to_shared_16x32_layout(thread_id, local_id)
58+
59+
.. py:function:: mma_sp_load_a_32x16_to_shared_16x64_layout(thread_id, local_id)
60+
61+
.. py:function:: mma_sp_load_b_32x8_to_shared_16x16_layout(thread_id, local_id)
62+
63+
.. py:function:: mma_sp_load_b_32x16_to_shared_16x32_layout(thread_id, local_id)
64+
65+
.. py:function:: mma_sp_load_b_32x32_to_shared_16x64_layout(thread_id, local_id)
66+
67+
.. py:function:: get_logical_id_32bit(thread_id)
68+
69+
.. py:function:: metadata_8bit_load_32x4_to_shared_16x4_layout_32bit(thread_id, local_id)
70+
71+
.. py:function:: metadata_16bit_load_32x2_to_shared_16x2_layout_32bit(thread_id, local_id)
72+
73+
.. py:function:: metadata_8bit_load_32x4_to_shared_16x4_layout_16bit(thread_id, local_id)
74+
75+
.. py:function:: metadata_16bit_load_32x2_to_shared_16x2_layout_16bit(thread_id, local_id)
76+
77+
.. py:function:: get_logical_id_8bit(thread_id)
78+
79+
.. py:function:: metadata_8bit_load_32x4_to_shared_16x4_layout_8bit(thread_id, local_id)
80+
81+
.. py:function:: metadata_16bit_load_32x2_to_shared_16x4_layout_8bit(thread_id, local_id)
82+
83+
.. py:function:: metadata_32bit_load_32x1_to_shared_16x2_layout_8bit(thread_id, local_id)
84+
85+
.. py:function:: ldmatrix_trans_32x8_to_shared_16x16_layout(thread_id, local_id)
86+
87+
.. py:function:: ldmatrix_32x16_to_shared_32x16_layout(thread_id, local_id)
88+
89+
.. py:function:: ldmatrix_trans_32x16_to_shared_16x32_layout(thread_id, local_id)
90+
91+
.. py:function:: ldmatrix_trans_32x32_to_shared_shared_16x64_layout(thread_id, local_id)
92+
93+
.. py:function:: get_ldmatrix_offset_b(matrix, row_idx, col_idx, stride, dtype = 'float16', transposed = False)
94+
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
tilelang.intrinsics.mma_sp_macro_generator
2+
==========================================
3+
4+
.. py:module:: tilelang.intrinsics.mma_sp_macro_generator
5+
6+
7+
Attributes
8+
----------
9+
10+
.. autoapisummary::
11+
12+
tilelang.intrinsics.mma_sp_macro_generator.lift
13+
14+
15+
Classes
16+
-------
17+
18+
.. autoapisummary::
19+
20+
tilelang.intrinsics.mma_sp_macro_generator.SparseTensorCoreIntrinEmitter
21+
22+
23+
Module Contents
24+
---------------
25+
26+
.. py:data:: lift
27+
28+
.. py:class:: SparseTensorCoreIntrinEmitter(a_dtype = 'float16', e_dtype = 'uint8', b_dtype = 'float16', accum_dtype = 'float16', a_transposed = False, b_transposed = False, e_transposed = False, block_row_warps = 2, block_col_warps = 2, warp_row_tiles = 8, warp_col_tiles = 8, warp_k = 16, reduce_k = 1, num_elems_per_byte = 1, is_m_first = False, thread_var = None)
29+
30+
To eliminate Python syntax within TIR Macro.
31+
32+
33+
.. py:attribute:: M_DIM
34+
:value: 16
35+
36+
37+
38+
.. py:attribute:: SPARSE_FACTOR
39+
:value: 2
40+
41+
42+
43+
.. py:attribute:: SPARSE_SELECTOR
44+
:value: 0
45+
46+
47+
48+
.. py:attribute:: n_dim
49+
:value: 16
50+
51+
52+
53+
.. py:attribute:: WARP_SIZE
54+
:value: 32
55+
56+
57+
58+
.. py:attribute:: dtype_abbrv
59+
60+
61+
.. py:attribute:: E_FACTOR_MAP
62+
63+
64+
.. py:attribute:: E_REPLICATE_FACTOR
65+
66+
67+
.. py:attribute:: is_m_first
68+
:value: False
69+
70+
71+
72+
.. py:attribute:: a_dtype
73+
:value: 'float16'
74+
75+
76+
77+
.. py:attribute:: e_dtype
78+
:value: 'uint8'
79+
80+
81+
82+
.. py:attribute:: b_dtype
83+
:value: 'float16'
84+
85+
86+
87+
.. py:attribute:: accum_dtype
88+
:value: 'float16'
89+
90+
91+
92+
.. py:attribute:: a_transposed
93+
:value: False
94+
95+
96+
97+
.. py:attribute:: b_transposed
98+
:value: False
99+
100+
101+
102+
.. py:attribute:: e_transposed
103+
:value: False
104+
105+
106+
107+
.. py:attribute:: block_row_warps
108+
:value: 2
109+
110+
111+
112+
.. py:attribute:: block_col_warps
113+
:value: 2
114+
115+
116+
117+
.. py:attribute:: warp_row_tiles
118+
:value: 8
119+
120+
121+
122+
.. py:attribute:: warp_col_tiles
123+
:value: 8
124+
125+
126+
127+
.. py:attribute:: warp_k
128+
:value: 16
129+
130+
131+
132+
.. py:attribute:: e_factor
133+
:value: 8
134+
135+
136+
137+
.. py:attribute:: reduce_k
138+
:value: 1
139+
140+
141+
142+
.. py:attribute:: threads
143+
:value: 128
144+
145+
146+
147+
.. py:attribute:: num_elems_per_byte
148+
:value: 1
149+
150+
151+
152+
.. py:attribute:: thread_var
153+
:value: None
154+
155+
156+
157+
.. py:method:: get_thread_binding()
158+
159+
160+
.. py:method:: get_store_index_map(inverse = False)
161+
162+
163+
.. py:method:: extract_thread_binding(thread_id, is_m_first = None)
164+
165+
is_m_first: True if the thread binding is in the form of (tx, warp_n, warp_m)
166+
which represents [warp_size, block_row_warps (split n), block_col_warps (split m)]
167+
Otherwise, it is in the form of [warp_size, block_col_warps (split m), block_row_warps (split n)]
168+
169+
170+
171+
.. py:method:: ldmatrix_a(A_local_buf, A_shared_buf, ki, rk = 0)
172+
173+
174+
.. py:method:: ldmatrix_e(E_local_buf, E_shared_buf, ki, rk = 0)
175+
176+
177+
.. py:method:: ldmatrix_b(B_local_buf, B_shared_buf, ki, rk = 0)
178+
179+
180+
.. py:method:: mma_sp(A_local_buf, E_local_buf, B_local_buf, C_local_buf, k_inner = 0)
181+
182+
183+
.. py:method:: stmatrix(C_local_buf, C_buf, pid_m=None, pid_n=None)
184+
185+
186+
.. py:method:: make_mma_load_layout(local_buf, matrix = 'A')
187+
188+
Create a layout function for storing MMA results into a fragment buffer.
189+
This layout is used in conjunction with `inverse_mma_store_layout` to
190+
map fragment indices to threads and local indices.
191+
192+
:param local_buf: The local buffer representing a fragment of a matrix.
193+
:type local_buf: tir.Buffer
194+
195+
:returns: A fragment object that describes how threads and indices
196+
in `local_buf` are laid out.
197+
:rtype: T.Fragment
198+
199+
:raises AssertionError: If `local_buf` is not detected to be a fragment buffer.
200+
201+
202+
203+
.. py:method:: make_mma_store_layout(local_buf)
204+
205+
Create a layout function for storing MMA results into a fragment buffer.
206+
This layout is used in conjunction with `inverse_mma_store_layout` to
207+
map fragment indices to threads and local indices.
208+
209+
:param local_buf: The local buffer representing a fragment of a matrix.
210+
:type local_buf: tir.Buffer
211+
212+
:returns: A fragment object that describes how threads and indices
213+
in `local_buf` are laid out.
214+
:rtype: T.Fragment
215+
216+
:raises AssertionError: If `local_buf` is not detected to be a fragment buffer.
217+
218+
219+

0 commit comments

Comments
 (0)