Skip to content

Commit 5301e07

Browse files
Update docs
1 parent 2fa872a commit 5301e07

File tree

190 files changed

+2814
-142
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

190 files changed

+2814
-142
lines changed

_sources/autoapi/tilelang/carver/arch/cdna/index.rst.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ Module Contents
3030
Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
3131

3232

33+
Represents the architecture of a computing device, capturing various hardware specifications.
34+
35+
3336
.. py:attribute:: target
3437
3538

_sources/autoapi/tilelang/carver/arch/cuda/index.rst.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,9 @@ Module Contents
4848
Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
4949

5050

51+
Represents the architecture of a computing device, capturing various hardware specifications.
52+
53+
5154
.. py:attribute:: target
5255
5356

_sources/autoapi/tilelang/carver/arch/metal/index.rst.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ Module Contents
3030
Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
3131

3232

33+
Represents the architecture of a computing device, capturing various hardware specifications.
34+
35+
3336
.. py:attribute:: target
3437
3538

_sources/autoapi/tilelang/carver/roller/policy/tensorcore/index.rst.txt

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,15 @@ Module Contents
3030

3131
.. py:data:: logger
3232
33-
.. py:class:: TensorCorePolicy
33+
.. py:class:: TensorCorePolicy(arch, tags = None)
3434
3535
Bases: :py:obj:`tilelang.carver.roller.policy.default.DefaultPolicy`
3636

3737

38+
Default Policy for fastdlight, a heuristic plan that tries to
39+
minimize memory traffic and maximize parallelism.for BitBLAS Schedule.
40+
41+
3842
.. py:attribute:: wmma_k
3943
:type: int
4044
:value: 16
@@ -61,16 +65,67 @@ Module Contents
6165

6266
.. py:method:: infer_node_smem_usage(td, node)
6367
68+
Infers the shared memory usage of a node given a TileDict configuration.
69+
70+
:param td: The TileDict object containing the tile configuration.
71+
:type td: TileDict
72+
:param node: The node for which to infer the shared memory usage.
73+
:type node: PrimFuncNode
74+
75+
:returns: The estimated amount of shared memory used by the node.
76+
:rtype: int
77+
78+
6479

6580
.. py:method:: get_node_reduce_step_candidates(node)
6681
82+
Calculates reduction step candidates for each reduction axis in a PrimFuncNode. General idea : use factor first, since it does not require extra boundary check. for large prime number, which is rare case, use power of 2.
83+
84+
:param node: The node for which to calculate reduction step candidates. It contains reduction axes (raxis)
85+
with their domains (dom.extent).
86+
:type node: PrimFuncNode
87+
88+
:returns: A dictionary mapping axis variable names to lists of step candidates. For each axis in the node,
89+
this function calculates possible step sizes. For axes with a large prime domain, it uses powers of 2
90+
as step candidates; for others, it uses all factors of the domain.
91+
:rtype: Dict[str, List[int]]
92+
93+
6794

6895
.. py:method:: check_tile_shape_isvalid(td)
6996
97+
Checks if the tile shapes in the TileDict are valid for the nodes in this context.
98+
99+
Parameters:
100+
- td (TileDict): The TileDict object containing tile shapes and other configurations.
101+
102+
Returns:
103+
- bool: True if all tile shapes are valid, False otherwise.
104+
105+
70106

71107
.. py:method:: compute_node_stride_map(node, td)
72108
109+
Computes the stride map for a given node based on the TileDict configuration.
110+
111+
:param node: The node for which to compute the stride map.
112+
:type node: PrimFuncNode
113+
:param td: The TileDict object containing the tile configuration.
114+
:type td: TileDict
115+
116+
:returns: A tuple of dictionaries containing the output strides and tensor strides.
117+
:rtype: Tuple[Dict, Dict]
118+
119+
73120

74121
.. py:method:: plan_rasterization(td)
75122
123+
Plans the rasterization for the given TileDict. This function is not implemented yet.
124+
125+
:param td: The TileDict object to plan rasterization for.
126+
:type td: TileDict
127+
128+
:raises RasterRationPlan: This function is not implemented yet.
129+
130+
76131

_sources/autoapi/tilelang/carver/template/flashattention/index.rst.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,11 @@ Module Contents
2020
Bases: :py:obj:`tilelang.carver.template.base.BaseTemplate`
2121

2222

23+
Base class template for hardware-aware configurations.
24+
This serves as an abstract base class (ABC) that defines the structure
25+
for subclasses implementing hardware-specific optimizations.
26+
27+
2328
.. py:attribute:: batch_size
2429
:type: int
2530
:value: 1

_sources/autoapi/tilelang/intrinsics/index.rst.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Submodules
1414
/autoapi/tilelang/intrinsics/mfma_macro_generator/index
1515
/autoapi/tilelang/intrinsics/mma_layout/index
1616
/autoapi/tilelang/intrinsics/mma_macro_generator/index
17+
/autoapi/tilelang/intrinsics/tcgen05_macro_generator/index
1718
/autoapi/tilelang/intrinsics/utils/index
1819
/autoapi/tilelang/intrinsics/wgmma_macro_generator/index
1920

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
tilelang.intrinsics.tcgen05_macro_generator
2+
===========================================
3+
4+
.. py:module:: tilelang.intrinsics.tcgen05_macro_generator
5+
6+
7+
Attributes
8+
----------
9+
10+
.. autoapisummary::
11+
12+
tilelang.intrinsics.tcgen05_macro_generator.lift
13+
14+
15+
Classes
16+
-------
17+
18+
.. autoapisummary::
19+
20+
tilelang.intrinsics.tcgen05_macro_generator.SwizzleMode
21+
tilelang.intrinsics.tcgen05_macro_generator.TensorCoreIntrinEmitter
22+
23+
24+
Module Contents
25+
---------------
26+
27+
.. py:data:: lift
28+
29+
.. py:class:: SwizzleMode
30+
31+
Bases: :py:obj:`enum.IntEnum`
32+
33+
34+
Enum where members are also (and must be) ints
35+
36+
37+
.. py:attribute:: NONE
38+
:value: 0
39+
40+
41+
42+
.. py:attribute:: SWIZZLE_128B
43+
:value: 2
44+
45+
46+
47+
.. py:attribute:: SWIZZLE_64B
48+
:value: 4
49+
50+
51+
52+
.. py:attribute:: SWIZZLE_32B
53+
:value: 6
54+
55+
56+
57+
.. py:method:: is_none()
58+
59+
60+
.. py:method:: is_swizzle_32b()
61+
62+
63+
.. py:method:: is_swizzle_64b()
64+
65+
66+
.. py:method:: is_swizzle_128b()
67+
68+
69+
.. py:method:: swizzle_byte_size()
70+
71+
72+
.. py:method:: swizzle_atom_size()
73+
74+
75+
.. py:class:: TensorCoreIntrinEmitter(a_dtype = 'float16', b_dtype = 'float16', accum_dtype = 'float16', a_transposed = False, b_transposed = False, block_row_warps = 2, block_col_warps = 2, warp_row_tiles = 8, warp_col_tiles = 8, chunk = 16, reduce_k = 1, num_elems_per_byte = 1, is_m_first = False, thread_var = None)
76+
77+
Bases: :py:obj:`tilelang.intrinsics.mma_macro_generator.TensorCoreIntrinEmitter`
78+
79+
80+
To eliminate Python syntax within TIR Macro.
81+
82+
83+
.. py:attribute:: tcgen05_prefix
84+
:type: str
85+
86+
87+
.. py:attribute:: a_shared_layout
88+
:type: tilelang.layout.Layout
89+
:value: None
90+
91+
92+
93+
.. py:attribute:: b_shared_layout
94+
:type: tilelang.layout.Layout
95+
:value: None
96+
97+
98+
99+
.. py:method:: tcgen05mma(A_buf, B_buf, C_local_buf, mbar, clear_accum = False)
100+
101+
102+
.. py:method:: make_mma_load_layout(local_buf, matrix = 'A')
103+
:abstractmethod:
104+
105+
106+
Create a layout function for storing MMA results into a fragment buffer.
107+
This layout is used in conjunction with `inverse_mma_store_layout` to
108+
map fragment indices to threads and local indices.
109+
110+
:param local_buf: The local buffer representing a fragment of a matrix.
111+
:type local_buf: tir.Buffer
112+
113+
:returns: A fragment object that describes how threads and indices
114+
in `local_buf` are laid out.
115+
:rtype: T.Fragment
116+
117+
:raises AssertionError: If `local_buf` is not detected to be a fragment buffer.
118+
119+
120+
121+
.. py:method:: make_mma_store_layout(tmem_buf)
122+
123+
Create the TCGEN5 tensor-memory layout used to store MMA accumulators.
124+
125+
:param tmem_buf: The local buffer representing tensormemory of a mma's output
126+
:type tmem_buf: tir.Buffer
127+
128+
:returns: Layout object describing how logical (i, j) coordinates map to the
129+
swizzled tensor-memory offsets required by TCGEN5MMA.
130+
:rtype: Layout
131+
132+
:raises AssertionError: If `tmem_buf` is not detected to be a tensor-memory buffer.
133+
134+
135+
136+
.. py:method:: get_tcgen5_mma_meta(m, n, k)
137+
138+
139+
.. py:method:: get_tcgen5_instr_desc(atom_m, atom_n, atom_k, a_is_k_major, b_is_k_major, scale_in_a, scale_in_b)
140+
141+

_sources/autoapi/tilelang/jit/adapter/dlpack/index.rst.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,12 @@ Classes
2020
Module Contents
2121
---------------
2222

23-
.. py:class:: TorchDLPackKernelAdapter
23+
.. py:class:: TorchDLPackKernelAdapter(mod, params, result_idx)
2424
2525
Bases: :py:obj:`tilelang.jit.adapter.base.BaseKernelAdapter`
2626

2727

28+
Helper class that provides a standard way to create an ABC using
29+
inheritance.
30+
31+

_sources/autoapi/tilelang/jit/kernel/index.rst.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,8 @@ Module Contents
116116

117117

118118
.. py:attribute:: target
119+
:value: 'auto'
120+
119121

120122

121123
.. py:method:: from_database(func, kernel_global_source, kernel_lib_path, params, target, target_host, out_idx, execution_backend, pass_configs = None, compile_flags = None)

_sources/autoapi/tilelang/language/allocate/index.rst.txt

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ tilelang.language.allocate
2222

2323

2424

25+
Attributes
26+
----------
27+
28+
.. autoapisummary::
29+
30+
tilelang.language.allocate.DescKind
31+
32+
2533
Functions
2634
---------
2735

@@ -35,6 +43,10 @@ Functions
3543
tilelang.language.allocate.alloc_tmem
3644
tilelang.language.allocate.alloc_reducer
3745
tilelang.language.allocate.alloc_descriptor
46+
tilelang.language.allocate.alloc_wgmma_desc
47+
tilelang.language.allocate.alloc_tcgen05_smem_desc
48+
tilelang.language.allocate.alloc_tcgen05_instruction_desc
49+
tilelang.language.allocate.alloc_tcgen05_instr_desc
3850

3951

4052
Module Contents
@@ -181,11 +193,23 @@ Module Contents
181193
:rtype: T.Buffer
182194

183195

184-
.. py:function:: alloc_descriptor(dtype='uint64', scope='local.descriptor')
196+
.. py:data:: DescKind
197+
198+
.. py:function:: alloc_descriptor(kind = 'wgmma', dtype = 'uint64')
185199
186-
Allocate a descriptor buffer for wgmma and utcmma.
200+
Allocate a descriptor buffer for WGMMA and TCGEN5.MMA.
201+
202+
:param kind: The descriptor kind, one of "wgmma", "tcgen05" ("utcmma" as alias).
187203

188204
:returns: A TVM buffer object allocated as a descriptor
189205
:rtype: T.Buffer
190206

191207

208+
.. py:function:: alloc_wgmma_desc(dtype = 'uint64')
209+
210+
.. py:function:: alloc_tcgen05_smem_desc(dtype = 'uint64')
211+
212+
.. py:function:: alloc_tcgen05_instruction_desc(dtype = 'uint32')
213+
214+
.. py:function:: alloc_tcgen05_instr_desc(dtype = 'uint32')
215+

0 commit comments

Comments
 (0)