tile-ai
diff --git a/‎_sources/autoapi/tilelang/carver/arch/cdna/index.rst.txt‎
Lines changed: 3 additions & 0 deletions b/‎_sources/autoapi/tilelang/carver/arch/cdna/index.rst.txt‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎_sources/autoapi/tilelang/carver/arch/cpu/index.rst.txt‎
Lines changed: 3 additions & 0 deletions b/‎_sources/autoapi/tilelang/carver/arch/cpu/index.rst.txt‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎_sources/autoapi/tilelang/carver/arch/cuda/index.rst.txt‎
Lines changed: 3 additions & 0 deletions b/‎_sources/autoapi/tilelang/carver/arch/cuda/index.rst.txt‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎_sources/autoapi/tilelang/carver/roller/hint/index.rst.txt‎
Lines changed: 2 additions & 2 deletions b/‎_sources/autoapi/tilelang/carver/roller/hint/index.rst.txt‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_sources/autoapi/tilelang/carver/roller/policy/tensorcore/index.rst.txt‎
Lines changed: 56 additions & 1 deletion b/‎_sources/autoapi/tilelang/carver/roller/policy/tensorcore/index.rst.txt‎
Lines changed: 56 additions & 1 deletion
diff --git a/‎_sources/autoapi/tilelang/carver/template/flashattention/index.rst.txt‎
Lines changed: 0 additions & 5 deletions b/‎_sources/autoapi/tilelang/carver/template/flashattention/index.rst.txt‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎_sources/autoapi/tilelang/carver/template/general_reduce/index.rst.txt‎
Lines changed: 0 additions & 18 deletions b/‎_sources/autoapi/tilelang/carver/template/general_reduce/index.rst.txt‎
Lines changed: 0 additions & 18 deletions
diff --git a/‎_sources/autoapi/tilelang/intrinsics/utils/index.rst.txt‎
Lines changed: 13 additions & 0 deletions b/‎_sources/autoapi/tilelang/intrinsics/utils/index.rst.txt‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎_sources/autoapi/tilelang/language/index.rst.txt‎
Lines changed: 11 additions & 0 deletions b/‎_sources/autoapi/tilelang/language/index.rst.txt‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎_sources/autoapi/tilelang/language/utils/index.rst.txt‎
Lines changed: 28 additions & 26 deletions b/‎_sources/autoapi/tilelang/language/utils/index.rst.txt‎
Lines changed: 28 additions & 26 deletions
@@ -30,6 +30,9 @@ Module Contents
    Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
 
 
+   Represents the architecture of a computing device, capturing various hardware specifications.
+
+
    .. py:attribute:: target
 
 
 
@@ -30,6 +30,9 @@ Module Contents
    Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
 
 
+   Represents the architecture of a computing device, capturing various hardware specifications.
+
+
    .. py:attribute:: target
 
 
 
@@ -92,6 +92,9 @@ Module Contents
    Bases: :py:obj:`tilelang.carver.arch.arch_base.TileDevice`
 
 
+   Represents the architecture of a computing device, capturing various hardware specifications.
+
+
    .. py:attribute:: target
 
 
 
@@ -307,12 +307,12 @@ Module Contents
 
 
    .. py:property:: raxis_order
-      :type: List[int]
+      :type: tilelang.carver.roller.rasterization.List[int]
 
 
 
    .. py:property:: step
-      :type: List[int]
+      :type: tilelang.carver.roller.rasterization.List[int]
 
 
 
 
@@ -30,11 +30,15 @@ Module Contents
 
 .. py:data:: logger
 
-.. py:class:: TensorCorePolicy
+.. py:class:: TensorCorePolicy(arch, tags = None)
 
    Bases: :py:obj:`tilelang.carver.roller.policy.default.DefaultPolicy`
 
 
+   Default Policy for fastdlight, a heuristic plan that tries to
+   minimize memory traffic and maximize parallelism.for BitBLAS Schedule.
+
+
    .. py:attribute:: wmma_k
       :type:  int
       :value: 16
@@ -61,16 +65,67 @@ Module Contents
 
    .. py:method:: infer_node_smem_usage(td, node)
 
+      Infers the shared memory usage of a node given a TileDict configuration.
+
+      :param td: The TileDict object containing the tile configuration.
+      :type td: TileDict
+      :param node: The node for which to infer the shared memory usage.
+      :type node: PrimFuncNode
+
+      :returns: The estimated amount of shared memory used by the node.
+      :rtype: int
+
+
 
    .. py:method:: get_node_reduce_step_candidates(node)
 
+      Calculates reduction step candidates for each reduction axis in a PrimFuncNode. General idea : use factor first, since it does not require extra boundary check. for large prime number, which is rare case, use power of 2.
+
+      :param node: The node for which to calculate reduction step candidates. It contains reduction axes (raxis)
+                   with their domains (dom.extent).
+      :type node: PrimFuncNode
+
+      :returns: A dictionary mapping axis variable names to lists of step candidates. For each axis in the node,
+                this function calculates possible step sizes. For axes with a large prime domain, it uses powers of 2
+                as step candidates; for others, it uses all factors of the domain.
+      :rtype: Dict[str, List[int]]
+
+
 
    .. py:method:: check_tile_shape_isvalid(td)
 
+      Checks if the tile shapes in the TileDict are valid for the nodes in this context.
+
+      Parameters:
+      - td (TileDict): The TileDict object containing tile shapes and other configurations.
+
+      Returns:
+      - bool: True if all tile shapes are valid, False otherwise.
+
+
 
    .. py:method:: compute_node_stride_map(node, td)
 
+      Computes the stride map for a given node based on the TileDict configuration.
+
+      :param node: The node for which to compute the stride map.
+      :type node: PrimFuncNode
+      :param td: The TileDict object containing the tile configuration.
+      :type td: TileDict
+
+      :returns: A tuple of dictionaries containing the output strides and tensor strides.
+      :rtype: Tuple[Dict, Dict]
+
+
 
    .. py:method:: plan_rasterization(td)
 
+      Plans the rasterization for the given TileDict. This function is not implemented yet.
+
+      :param td: The TileDict object to plan rasterization for.
+      :type td: TileDict
+
+      :raises RasterRationPlan: This function is not implemented yet.
+
+
 
@@ -20,11 +20,6 @@ Module Contents
    Bases: :py:obj:`tilelang.carver.template.base.BaseTemplate`
 
 
-   Base class template for hardware-aware configurations.
-   This serves as an abstract base class (ABC) that defines the structure
-   for subclasses implementing hardware-specific optimizations.
-
-
    .. py:attribute:: batch_size
       :type:  int
       :value: 1
 
@@ -20,11 +20,6 @@ Module Contents
    Bases: :py:obj:`tilelang.carver.template.base.BaseTemplate`
 
 
-   Base class template for hardware-aware configurations.
-   This serves as an abstract base class (ABC) that defines the structure
-   for subclasses implementing hardware-specific optimizations.
-
-
    .. py:attribute:: structure
       :type:  Union[str, List[str]]
       :value: None
@@ -45,19 +40,6 @@ Module Contents
 
    .. py:method:: get_hardware_aware_configs(arch = None, topk = 10)
 
-      Abstract method that must be implemented by subclasses.
-      It should return a list of hardware-aware configurations (hints)
-      based on the specified architecture.
-
-      :param arch: The target architecture. Defaults to None.
-      :type arch: TileDevice, optional
-      :param topk: Number of top configurations to return. Defaults to 10.
-      :type topk: int, optional
-
-      :returns: A list of recommended hardware-aware configurations.
-      :rtype: List[Hint]
-
-
 
    .. py:method:: initialize_function()
 
 
@@ -35,3 +35,16 @@ Module Contents
 
 .. py:function:: get_mma_micro_size(dtype)
 
+   Return the MMA (Tensor Core) micro-tile dimensions for a given data type.
+
+   This function returns the micro tile sizes (x, y, k) used by MMA/Tensor Core operations.
+   - x: tile width in the output/result dimension
+   - y: tile height in the output/result dimension
+   - k: tile depth in the reduction/K dimension
+
+   Accepted dtype strings include "float16", "int8" and some FP8 identifiers ("float8_e4m3", "float8_e5m2"). For FP8 and int8 types the reduction depth (`k`) is 32; for float16 it is 16.
+
+   :returns: (micro_size_x, micro_size_y, micro_size_k)
+   :rtype: tuple[int, int, int]
+
+
@@ -55,6 +55,17 @@ Package Contents
 
 .. py:function:: symbolic(name, dtype = 'int32')
 
+   Create a TIR symbolic variable.
+
+   :param name: Identifier for the variable in generated TIR.
+   :type name: str
+   :param dtype: Data type string for the variable (e.g., "int32"). Defaults to "int32".
+   :type dtype: str
+
+   :returns: A TIR variable with the given name and dtype for use in TIR/TensorIR kernels.
+   :rtype: tir.Var
+
+
 .. py:function:: use_swizzle(panel_size, order = 'row', enable = True)
 
 .. py:function:: annotate_layout(layout_map)
 
@@ -18,41 +18,43 @@ Module Contents
 
 .. py:function:: index_to_coordinates(index, shape)
 
-   Convert a flat (linear) index to multi-dimensional coordinates for a given shape.
+   Convert a flat (linear) index into multi-dimensional coordinates for a given shape.
 
-   .. rubric:: Example
-
-   shape = (4, 5, 6)
-   index = 53
-   index_to_coordinates(53, (4, 5, 6)) -> [1, 3, 5]
-   # Explanation:
-   # 53 // (5*6) = 1  (1st coordinate)
-   # 53 % (5*6) = 23
-   # 23 // 6 = 3      (2nd coordinate)
-   # 23 % 6 = 5       (3rd coordinate)
+   Given a linear index and a shape (sequence of dimension extents), returns a list of coordinates (one per dimension) such that converting those coordinates back to a linear index using the usual row-major / C-order formula yields the original index. The computation iterates from the last dimension to the first using modulo and integer division, then reverses the collected coordinates.
 
    :param index: The flat index to convert.
-   :type index: int
-   :param shape: The shape of the multi-dimensional array.
-   :type shape: tuple or list of int
+   :type index: int or PrimExpr
+   :param shape: The extents of each dimension (length >= 1).
+   :type shape: Sequence[int]
 
-   :returns: A list of coordinates corresponding to each dimension.
-   :rtype: list
+   :returns: Coordinates for each dimension in the same order as `shape`.
+   :rtype: list[PrimExpr]
 
 
 .. py:function:: linear_index(*args)
 
-   Convert a list of coordinates to a flat (linear) index using strides.
+   Compute a flat (linear) index from multi-dimensional coordinates and strides.
+
+   The function accepts a sequence of PrimExpr arguments where the first portion are coordinates
+   and the trailing portion are the corresponding strides. The number of strides must equal
+   (number of coordinates - 1). The linear index is computed as:
+
+       linear = coords[0]
+       for each (coord, stride) in zip(coords[1:], strides):
+           linear = linear * stride + coord
+
+   .. rubric:: Examples
+
+   - linear_index(i) -> i
+   - linear_index(i, j) -> i * j_stride + j  (requires j_stride provided as stride when needed)
+   - linear_index(i, j, stride_j) -> i * stride_j + j
+   - linear_index(i, j, k, stride_j, stride_k) -> i*stride_j*stride_k + j*stride_k + k
+   - linear_index(i, tx, v, threads, local_size) -> i*threads*local_size + tx*local_size + v
 
-   Usage examples:
-       linear_index(i)                         -> i
-       linear_index(i, j)                      -> i * stride + j
-       linear_index(i, j, stride_j)            -> i * stride_j + j
-       linear_index(i, j, k, stride_j, stride_k)
-                                               -> i * stride_j * stride_k + j * stride_k + k
+   :raises ValueError: If called with no arguments, or if the number of strides is not one less than
+       the number of coordinates.
 
-       Example for index = i * threads * local_size + tx * local_size + v:
-       Suppose you have i, tx, v as coordinates, and threads, local_size as strides:
-       linear_index(i, tx, v, threads, local_size) == i * threads * local_size + tx * local_size + v
+   :returns: The computed linear index expression.
+   :rtype: PrimExpr