tile-ai
diff --git a/‎CNAME‎
Lines changed: 1 addition & 0 deletions b/‎CNAME‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎_images/LayoutInference.png‎
821 KB b/‎_images/LayoutInference.png‎
821 KB
diff --git a/‎_images/MatmulExample.png‎
807 KB b/‎_images/MatmulExample.png‎
807 KB
diff --git a/‎_images/overview.png‎
142 KB b/‎_images/overview.png‎
142 KB
diff --git a/‎_sources/get_started/Installation.rst.txt‎
Lines changed: 179 additions & 0 deletions b/‎_sources/get_started/Installation.rst.txt‎
Lines changed: 179 additions & 0 deletions
diff --git a/‎_sources/get_started/overview.rst.txt‎
Lines changed: 133 additions & 0 deletions b/‎_sources/get_started/overview.rst.txt‎
Lines changed: 133 additions & 0 deletions
diff --git a/‎_sources/index.rst.txt‎
Lines changed: 33 additions & 0 deletions b/‎_sources/index.rst.txt‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎_sources/language_ref/ast.rst.txt‎
Lines changed: 2 additions & 0 deletions b/‎_sources/language_ref/ast.rst.txt‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎_sources/language_ref/primitives.rst.txt‎
Lines changed: 2 additions & 0 deletions b/‎_sources/language_ref/primitives.rst.txt‎
Lines changed: 2 additions & 0 deletions
@@ -0,0 +1 @@
+tilelang.tile-ai.cn
@@ -0,0 +1,179 @@
+Installation Guide
+==================
+
+Installing with pip
+-------------------
+
+**Prerequisites for installation via wheel or PyPI:**
+
+- **Operating System**: Ubuntu 20.04 or later
+
+- **Python Version**: >= 3.8
+
+- **CUDA Version**: >= 11.0
+
+The easiest way to install TileLang is directly from PyPI using pip. To install the latest version, run the following command in your terminal:
+
+.. code:: bash
+
+   pip install tilelang
+
+Alternatively, you may choose to install TileLang using prebuilt packages available on the Release Page:
+
+.. code:: bash
+
+   pip install tilelang-0.0.0.dev0+ubuntu.20.4.cu120-py3-none-any.whl
+
+To install the latest version of TileLang from the GitHub repository, you can run the following command:
+
+.. code:: bash
+
+   pip install git+https://github.com/tile-ai/tilelang.git
+
+After installing TileLang, you can verify the installation by running:
+
+.. code:: bash
+
+   python -c "import tilelang; print(tilelang.__version__)"
+
+Building from Source
+--------------------
+
+**Prerequisites for building from source:**
+
+- **Operating System**: Linux
+
+- **Python Version**: >= 3.7
+
+- **CUDA Version**: >= 10.0
+
+We recommend using a Docker container with the necessary dependencies to build TileLang from source. You can use the following command to run a Docker container with the required dependencies:
+
+.. code:: bash
+
+   docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.01-py3
+
+To build and install TileLang directly from source, follow these steps. This process requires certain pre-requisites from Apache TVM, which can be installed on Ubuntu/Debian-based systems using the following commands:
+
+.. code:: bash
+
+   sudo apt-get update
+   sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
+
+After installing the prerequisites, you can clone the TileLang repository and install it using pip:
+
+.. code:: bash
+
+   git clone --recursive https://github.com/tile-ai/tilelang.git
+   cd tileLang
+   pip install .  # Please be patient, this may take some time.
+
+If you want to install TileLang in development mode, you can run the following command:
+
+.. code:: bash
+
+   pip install -e .
+
+We currently provide three methods to install **TileLang**:
+
+1. `Install from Source (using your own TVM installation)`_
+2. `Install from Source (using the bundled TVM submodule)`_
+3. `Install Using the Provided Script`_
+
+.. _Install from Source (using your own TVM installation): #method-1-install-from-source-using-your-own-tvm-installation
+.. _Install from Source (using the bundled TVM submodule): #method-2-install-from-source-using-the-bundled-tvm-submodule
+.. _Install Using the Provided Script: #method-3-install-using-the-provided-script
+
+
+Method 1: Install from Source (Using Your Own TVM Installation)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you already have a compatible TVM installation, follow these steps:
+
+1. **Clone the Repository**:
+
+   .. code:: bash
+
+      git clone --recursive https://github.com/tile-ai/tilelang
+      cd tilelang
+
+   **Note**: Use the `--recursive` flag to include necessary submodules.
+
+2. **Configure Build Options**:
+
+   Create a build directory and specify your existing TVM path:
+
+   .. code:: bash
+
+      mkdir build
+      cd build
+      cmake .. -DTVM_PREBUILD_PATH=/your/path/to/tvm/build  # e.g., /workspace/tvm/build
+      make -j 16
+
+3. **Set Environment Variables**:
+
+   Update `PYTHONPATH` to include the `tile-lang` Python module:
+
+   .. code:: bash
+
+      export PYTHONPATH=/your/path/to/tilelang/:$PYTHONPATH
+      # TVM_IMPORT_PYTHON_PATH is used by 3rd-party frameworks to import TVM
+      export TVM_IMPORT_PYTHON_PATH=/your/path/to/tvm/python
+
+Method 2: Install from Source (Using the Bundled TVM Submodule)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you prefer to use the built-in TVM version, follow these instructions:
+
+1. **Clone the Repository**:
+
+   .. code:: bash
+
+      git clone --recursive https://github.com/tile-ai/tilelang
+      cd tilelang
+
+   **Note**: Ensure the `--recursive` flag is included to fetch submodules.
+
+2. **Configure Build Options**:
+
+   Copy the configuration file and enable the desired backends (e.g., LLVM and CUDA):
+
+   .. code:: bash
+
+      mkdir build
+      cp 3rdparty/tvm/cmake/config.cmake build
+      cd build
+      echo "set(USE_LLVM ON)" >> config.cmake
+      echo "set(USE_CUDA ON)" >> config.cmake 
+      # or echo "set(USE_ROCM ON)" >> config.cmake to enable ROCm runtime
+      cmake ..
+      make -j 16
+
+   The build outputs (e.g., `libtilelang.so`, `libtvm.so`, `libtvm_runtime.so`) will be generated in the `build` directory.
+
+3. **Set Environment Variables**:
+
+   Ensure the `tile-lang` Python package is in your `PYTHONPATH`:
+
+   .. code:: bash
+
+      export PYTHONPATH=/your/path/to/tilelang/:$PYTHONPATH
+
+Method 3: Install Using the Provided Script
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For a simplified installation, use the provided script:
+
+1. **Clone the Repository**:
+
+   .. code:: bash
+
+      git clone --recursive https://github.com/tile-ai/tilelang
+      cd tilelang
+
+2. **Run the Installation Script**:
+
+   .. code:: bash
+
+      bash install_cuda.sh
+      # or bash `install_amd.sh` if you want to enable ROCm runtime
@@ -0,0 +1,133 @@
+The Tile Language: A Brief Introduction
+===============================
+
+.. _sec-overview:
+
+Programming Interface
+---------------------
+
+The figure below depicts how **TileLang** programs are progressively lowered from a high-level description to hardware-specific executables. We provide three different programming interfaces—targeted at **Beginner**, **Developer**, and **Expert** users—that each reside at different levels in this lowering pipeline. The **Tile Language** also allows mixing these interfaces within the same kernel, enabling users to work at whichever level of abstraction best suits their needs.
+
+.. _fig-overview:
+
+.. figure:: ../_static/img/overview.png
+   :align: center
+   :width: 50%
+   :alt: Overview
+
+   High-level overview of the TileLang compilation flow.
+
+Programming Interfaces
+----------------------
+
+1. **Beginner Level (Hardware-Unaware)**
+   - Intended for users who need to write code that is independent of specific hardware details.  
+   - The goal is to let developers focus on the basic logic without worrying about memory hierarchies or hardware-specific optimizations.  
+   - *Note:* This interface is not yet fully implemented.
+
+2. **Developer Level (Hardware-Aware with Tile Library)**
+   - Designed for developers who have a basic understanding of GPU memory hierarchies and performance considerations.  
+   - Provides a **Tile Library**, containing predefined operations and patterns optimized for various hardware architectures.  
+   - Users at this level can leverage these ready-made primitives without diving into low-level threading details.
+
+3. **Expert Level (Hardware-Aware with Thread Primitives)**
+   - For highly experienced users who have an in-depth understanding of low-level hardware characteristics (e.g., threading models, memory coalescing).  
+   - Offers direct access to **thread primitives** and other low-level constructs, allowing for fine-grained control of performance-critical kernels.  
+   - This level grants maximum flexibility for specialized optimizations tailored to specific GPU or multi-core architectures.
+
+Compilation Flow
+----------------
+
+1. **Tile Program**  
+   A high-level specification of the computation. Depending on the user’s expertise, they may write a purely hardware-unaware tile program or incorporate constructs from the Tile Library or thread primitives.
+
+2. **Tile Program with Tile Library**  
+   When developers choose from the Tile Library, the original Tile Program is expanded with specialized library calls. These calls encapsulate efficient implementation patterns for different operations.
+
+3. **Tile Program with Thread Primitives**  
+   Expert-level developers can explicitly use low-level threading constructs to hand-optimize data layout, synchronization, and memory usage.
+
+4. **IRModule**  
+   After the program is composed with libraries or thread primitives, it is lowered to an intermediate representation (IR) that captures the necessary hardware details.
+
+5. **Source Code Generation (C/CUDA/HIP/LLVM/…)**  
+   From the IR, the system generates target-specific source code. This source code is tuned for the desired backends or GPU architectures (e.g., NVIDIA, AMD).
+
+6. **Hardware-Specific Executable/Runtime**  
+   Finally, the generated source is compiled into hardware-specific executables, ready to run on the corresponding devices. The pipeline supports multiple GPU backends and can be extended to additional architectures.
+
+
+.. _sec-tile_based_programming_model:
+
+Tile-based Programming Model
+----------------------------
+
+Figure :ref:`fig-matmul_example` provides a concise matrix multiplication (GEMM) example in ``TileLang``, 
+illustrating how developers can employ high-level constructs such as tiles, memory placement, pipelining, 
+and operator calls to manage data movement and computation with fine-grained control.
+In particular, this snippet (Figure :ref:`fig-matmul_example` (a)) demonstrates how multi-level tiling 
+leverages different memory hierarchies (global, shared, and registers) to optimize bandwidth utilization 
+and reduce latency.
+Overall, Figure :ref:`fig-matmul_example` (b) showcases how the Python-like syntax of ``TileLang`` 
+allows developers to reason about performance-critical optimizations within a user-friendly programming model.
+
+.. _fig-matmul_example:
+
+.. figure:: ../_static/img/MatmulExample.png
+   :align: center
+   :width: 100%
+   :alt: GEMM with Multi-Level Tiling on GPUs
+
+   Optimizing GEMM with Multi-Level Tiling on GPUs via ``TileLang``.
+
+Tile declarations
+~~~~~~~~~~~~~~~~~
+
+At the heart of our approach is the notion of *tiles* as first-class objects in the programming model.
+A tile represents a shaped portion of data, which can be owned and manipulated by a warp, thread block, 
+or equivalent parallel unit.
+In the ``Matmul`` example, the ``A`` and ``B`` buffers are read in tiled chunks (determined by ``block_M``, 
+``block_N``, ``block_K``) inside the kernel loop.
+With ``T.Kernel``, ``TileLang`` defines the execution context, which includes the thread block index (``bx`` 
+and ``by``) and the number of threads.
+These contexts can help compute the index for each thread block and make it easier for ``TileLang`` 
+to automatically infer and optimize memory access and computation.
+Additionally, these contexts allow users to manually control the behavior of each independent thread within 
+a thread block.
+
+Explicit Hardware Memory Allocation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A hallmark of ``TileLang`` is the ability to explicitly place these tile buffers in the hardware memory hierarchy.
+Rather than leaving it to a compiler's opaque optimization passes, ``TileLang`` exposes user-facing intrinsics 
+that map directly to physical memory spaces or accelerator-specific constructs.
+In particular:
+
+- ``T.alloc_shared``: Allocates memory in a fast, on-chip storage space, which corresponds to shared memory on NVIDIA GPUs.
+  Shared memory is ideal for caching intermediate data during computations, as it is significantly faster than global memory
+  and allows for efficient data sharing between threads in the same thread block.
+  For example, in matrix multiplication, tiles of matrices can be loaded into shared memory
+  to reduce global memory bandwidth demands and improve performance.
+
+- ``T.alloc_fragment``: Allocates accumulators in fragment memory, which corresponds to register files on NVIDIA GPUs.
+  By keeping inputs and partial sums in registers or hardware-level caches, latency is further minimized.
+  Note that in this tile program, each tile allocates the same local buffers as shared memory,
+  which might seem counterintuitive, as shared memory is generally faster but more abundant,
+  whereas register file space is limited.
+  This is because the allocation here refers to the register files for an entire thread block.
+  ``TileLang`` uses a Layout Inference Pass during compilation to derive a Layout object ``T.Fragment``,
+  which determines how to allocate the corresponding register files for each thread.
+  This process will be discussed in detail in subsequent sections.
+
+Data transfer between global memory and hardware-specific memory can be managed using ``T.copy``.
+Furthermore, hardware-specific buffers can be initialized using ``T.clear`` or ``T.fill``.
+For data assignments, operations can also be performed in parallel using ``T.Parallel``,
+as demonstrated in Layout Inference Pass in the following sections.
+
+
+.. _fig-layout_inference:
+
+.. figure:: ../_static/img/LayoutInference.png
+   :align: center
+   :width: 100%
+   :alt: GEMM with Multi-Level Tiling on GPUs
@@ -0,0 +1,33 @@
+👋 Welcome to Tile Language
+===========================
+
+`GitHub <https://github.com/tile-ai/tilelang>`_
+
+Tile Language (tile-lang) is a concise domain-specific language designed to streamline 
+the development of high-performance GPU/CPU kernels (e.g., GEMM, Dequant GEMM, FlashAttention, LinearAttention). 
+By employing a Pythonic syntax with an underlying compiler infrastructure on top of TVM, 
+tile-lang allows developers to focus on productivity without sacrificing the 
+low-level optimizations necessary for state-of-the-art performance.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: GET STARTED
+
+   get_started/Installation.rst
+   get_started/overview.rst
+
+.. toctree::
+   :maxdepth: 2
+   :caption: LANGUAGE REFERENCE
+
+   language_ref/ast.rst
+   language_ref/primitives.rst
+   language_ref/tilelibrary.rst
+   
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Privacy
+
+   privacy.rst
@@ -0,0 +1,2 @@
+Tile Language AST
+==================
@@ -0,0 +1,2 @@
+Tile Language: Primitives
+=========================
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+Tile Language AST`
	`2`	`+==================`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+Tile Language: Primitives`
	`2`	`+=========================`