Skip to content

Commit 470d594

Browse files
author
GitHub Actions
committed
Update docs
1 parent 8d629ec commit 470d594

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+19232
-0
lines changed

CNAME

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
tilelang.tile-ai.cn

_images/LayoutInference.png

821 KB
Loading

_images/MatmulExample.png

807 KB
Loading

_images/overview.png

142 KB
Loading
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
Installation Guide
2+
==================
3+
4+
Installing with pip
5+
-------------------
6+
7+
**Prerequisites for installation via wheel or PyPI:**
8+
9+
- **Operating System**: Ubuntu 20.04 or later
10+
11+
- **Python Version**: >= 3.8
12+
13+
- **CUDA Version**: >= 11.0
14+
15+
The easiest way to install TileLang is directly from PyPI using pip. To install the latest version, run the following command in your terminal:
16+
17+
.. code:: bash
18+
19+
pip install tilelang
20+
21+
Alternatively, you may choose to install TileLang using prebuilt packages available on the Release Page:
22+
23+
.. code:: bash
24+
25+
pip install tilelang-0.0.0.dev0+ubuntu.20.4.cu120-py3-none-any.whl
26+
27+
To install the latest version of TileLang from the GitHub repository, you can run the following command:
28+
29+
.. code:: bash
30+
31+
pip install git+https://github.com/tile-ai/tilelang.git
32+
33+
After installing TileLang, you can verify the installation by running:
34+
35+
.. code:: bash
36+
37+
python -c "import tilelang; print(tilelang.__version__)"
38+
39+
Building from Source
40+
--------------------
41+
42+
**Prerequisites for building from source:**
43+
44+
- **Operating System**: Linux
45+
46+
- **Python Version**: >= 3.7
47+
48+
- **CUDA Version**: >= 10.0
49+
50+
We recommend using a Docker container with the necessary dependencies to build TileLang from source. You can use the following command to run a Docker container with the required dependencies:
51+
52+
.. code:: bash
53+
54+
docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.01-py3
55+
56+
To build and install TileLang directly from source, follow these steps. This process requires certain pre-requisites from Apache TVM, which can be installed on Ubuntu/Debian-based systems using the following commands:
57+
58+
.. code:: bash
59+
60+
sudo apt-get update
61+
sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
62+
63+
After installing the prerequisites, you can clone the TileLang repository and install it using pip:
64+
65+
.. code:: bash
66+
67+
git clone --recursive https://github.com/tile-ai/tilelang.git
68+
cd tileLang
69+
pip install . # Please be patient, this may take some time.
70+
71+
If you want to install TileLang in development mode, you can run the following command:
72+
73+
.. code:: bash
74+
75+
pip install -e .
76+
77+
We currently provide three methods to install **TileLang**:
78+
79+
1. `Install from Source (using your own TVM installation)`_
80+
2. `Install from Source (using the bundled TVM submodule)`_
81+
3. `Install Using the Provided Script`_
82+
83+
.. _Install from Source (using your own TVM installation): #method-1-install-from-source-using-your-own-tvm-installation
84+
.. _Install from Source (using the bundled TVM submodule): #method-2-install-from-source-using-the-bundled-tvm-submodule
85+
.. _Install Using the Provided Script: #method-3-install-using-the-provided-script
86+
87+
88+
Method 1: Install from Source (Using Your Own TVM Installation)
89+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
90+
91+
If you already have a compatible TVM installation, follow these steps:
92+
93+
1. **Clone the Repository**:
94+
95+
.. code:: bash
96+
97+
git clone --recursive https://github.com/tile-ai/tilelang
98+
cd tilelang
99+
100+
**Note**: Use the `--recursive` flag to include necessary submodules.
101+
102+
2. **Configure Build Options**:
103+
104+
Create a build directory and specify your existing TVM path:
105+
106+
.. code:: bash
107+
108+
mkdir build
109+
cd build
110+
cmake .. -DTVM_PREBUILD_PATH=/your/path/to/tvm/build # e.g., /workspace/tvm/build
111+
make -j 16
112+
113+
3. **Set Environment Variables**:
114+
115+
Update `PYTHONPATH` to include the `tile-lang` Python module:
116+
117+
.. code:: bash
118+
119+
export PYTHONPATH=/your/path/to/tilelang/:$PYTHONPATH
120+
# TVM_IMPORT_PYTHON_PATH is used by 3rd-party frameworks to import TVM
121+
export TVM_IMPORT_PYTHON_PATH=/your/path/to/tvm/python
122+
123+
Method 2: Install from Source (Using the Bundled TVM Submodule)
124+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125+
126+
If you prefer to use the built-in TVM version, follow these instructions:
127+
128+
1. **Clone the Repository**:
129+
130+
.. code:: bash
131+
132+
git clone --recursive https://github.com/tile-ai/tilelang
133+
cd tilelang
134+
135+
**Note**: Ensure the `--recursive` flag is included to fetch submodules.
136+
137+
2. **Configure Build Options**:
138+
139+
Copy the configuration file and enable the desired backends (e.g., LLVM and CUDA):
140+
141+
.. code:: bash
142+
143+
mkdir build
144+
cp 3rdparty/tvm/cmake/config.cmake build
145+
cd build
146+
echo "set(USE_LLVM ON)" >> config.cmake
147+
echo "set(USE_CUDA ON)" >> config.cmake
148+
# or echo "set(USE_ROCM ON)" >> config.cmake to enable ROCm runtime
149+
cmake ..
150+
make -j 16
151+
152+
The build outputs (e.g., `libtilelang.so`, `libtvm.so`, `libtvm_runtime.so`) will be generated in the `build` directory.
153+
154+
3. **Set Environment Variables**:
155+
156+
Ensure the `tile-lang` Python package is in your `PYTHONPATH`:
157+
158+
.. code:: bash
159+
160+
export PYTHONPATH=/your/path/to/tilelang/:$PYTHONPATH
161+
162+
Method 3: Install Using the Provided Script
163+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164+
165+
For a simplified installation, use the provided script:
166+
167+
1. **Clone the Repository**:
168+
169+
.. code:: bash
170+
171+
git clone --recursive https://github.com/tile-ai/tilelang
172+
cd tilelang
173+
174+
2. **Run the Installation Script**:
175+
176+
.. code:: bash
177+
178+
bash install_cuda.sh
179+
# or bash `install_amd.sh` if you want to enable ROCm runtime
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
The Tile Language: A Brief Introduction
2+
===============================
3+
4+
.. _sec-overview:
5+
6+
Programming Interface
7+
---------------------
8+
9+
The figure below depicts how **TileLang** programs are progressively lowered from a high-level description to hardware-specific executables. We provide three different programming interfaces—targeted at **Beginner**, **Developer**, and **Expert** users—that each reside at different levels in this lowering pipeline. The **Tile Language** also allows mixing these interfaces within the same kernel, enabling users to work at whichever level of abstraction best suits their needs.
10+
11+
.. _fig-overview:
12+
13+
.. figure:: ../_static/img/overview.png
14+
:align: center
15+
:width: 50%
16+
:alt: Overview
17+
18+
High-level overview of the TileLang compilation flow.
19+
20+
Programming Interfaces
21+
----------------------
22+
23+
1. **Beginner Level (Hardware-Unaware)**
24+
- Intended for users who need to write code that is independent of specific hardware details.
25+
- The goal is to let developers focus on the basic logic without worrying about memory hierarchies or hardware-specific optimizations.
26+
- *Note:* This interface is not yet fully implemented.
27+
28+
2. **Developer Level (Hardware-Aware with Tile Library)**
29+
- Designed for developers who have a basic understanding of GPU memory hierarchies and performance considerations.
30+
- Provides a **Tile Library**, containing predefined operations and patterns optimized for various hardware architectures.
31+
- Users at this level can leverage these ready-made primitives without diving into low-level threading details.
32+
33+
3. **Expert Level (Hardware-Aware with Thread Primitives)**
34+
- For highly experienced users who have an in-depth understanding of low-level hardware characteristics (e.g., threading models, memory coalescing).
35+
- Offers direct access to **thread primitives** and other low-level constructs, allowing for fine-grained control of performance-critical kernels.
36+
- This level grants maximum flexibility for specialized optimizations tailored to specific GPU or multi-core architectures.
37+
38+
Compilation Flow
39+
----------------
40+
41+
1. **Tile Program**
42+
A high-level specification of the computation. Depending on the user’s expertise, they may write a purely hardware-unaware tile program or incorporate constructs from the Tile Library or thread primitives.
43+
44+
2. **Tile Program with Tile Library**
45+
When developers choose from the Tile Library, the original Tile Program is expanded with specialized library calls. These calls encapsulate efficient implementation patterns for different operations.
46+
47+
3. **Tile Program with Thread Primitives**
48+
Expert-level developers can explicitly use low-level threading constructs to hand-optimize data layout, synchronization, and memory usage.
49+
50+
4. **IRModule**
51+
After the program is composed with libraries or thread primitives, it is lowered to an intermediate representation (IR) that captures the necessary hardware details.
52+
53+
5. **Source Code Generation (C/CUDA/HIP/LLVM/…)**
54+
From the IR, the system generates target-specific source code. This source code is tuned for the desired backends or GPU architectures (e.g., NVIDIA, AMD).
55+
56+
6. **Hardware-Specific Executable/Runtime**
57+
Finally, the generated source is compiled into hardware-specific executables, ready to run on the corresponding devices. The pipeline supports multiple GPU backends and can be extended to additional architectures.
58+
59+
60+
.. _sec-tile_based_programming_model:
61+
62+
Tile-based Programming Model
63+
----------------------------
64+
65+
Figure :ref:`fig-matmul_example` provides a concise matrix multiplication (GEMM) example in ``TileLang``,
66+
illustrating how developers can employ high-level constructs such as tiles, memory placement, pipelining,
67+
and operator calls to manage data movement and computation with fine-grained control.
68+
In particular, this snippet (Figure :ref:`fig-matmul_example` (a)) demonstrates how multi-level tiling
69+
leverages different memory hierarchies (global, shared, and registers) to optimize bandwidth utilization
70+
and reduce latency.
71+
Overall, Figure :ref:`fig-matmul_example` (b) showcases how the Python-like syntax of ``TileLang``
72+
allows developers to reason about performance-critical optimizations within a user-friendly programming model.
73+
74+
.. _fig-matmul_example:
75+
76+
.. figure:: ../_static/img/MatmulExample.png
77+
:align: center
78+
:width: 100%
79+
:alt: GEMM with Multi-Level Tiling on GPUs
80+
81+
Optimizing GEMM with Multi-Level Tiling on GPUs via ``TileLang``.
82+
83+
Tile declarations
84+
~~~~~~~~~~~~~~~~~
85+
86+
At the heart of our approach is the notion of *tiles* as first-class objects in the programming model.
87+
A tile represents a shaped portion of data, which can be owned and manipulated by a warp, thread block,
88+
or equivalent parallel unit.
89+
In the ``Matmul`` example, the ``A`` and ``B`` buffers are read in tiled chunks (determined by ``block_M``,
90+
``block_N``, ``block_K``) inside the kernel loop.
91+
With ``T.Kernel``, ``TileLang`` defines the execution context, which includes the thread block index (``bx``
92+
and ``by``) and the number of threads.
93+
These contexts can help compute the index for each thread block and make it easier for ``TileLang``
94+
to automatically infer and optimize memory access and computation.
95+
Additionally, these contexts allow users to manually control the behavior of each independent thread within
96+
a thread block.
97+
98+
Explicit Hardware Memory Allocation
99+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
100+
101+
A hallmark of ``TileLang`` is the ability to explicitly place these tile buffers in the hardware memory hierarchy.
102+
Rather than leaving it to a compiler's opaque optimization passes, ``TileLang`` exposes user-facing intrinsics
103+
that map directly to physical memory spaces or accelerator-specific constructs.
104+
In particular:
105+
106+
- ``T.alloc_shared``: Allocates memory in a fast, on-chip storage space, which corresponds to shared memory on NVIDIA GPUs.
107+
Shared memory is ideal for caching intermediate data during computations, as it is significantly faster than global memory
108+
and allows for efficient data sharing between threads in the same thread block.
109+
For example, in matrix multiplication, tiles of matrices can be loaded into shared memory
110+
to reduce global memory bandwidth demands and improve performance.
111+
112+
- ``T.alloc_fragment``: Allocates accumulators in fragment memory, which corresponds to register files on NVIDIA GPUs.
113+
By keeping inputs and partial sums in registers or hardware-level caches, latency is further minimized.
114+
Note that in this tile program, each tile allocates the same local buffers as shared memory,
115+
which might seem counterintuitive, as shared memory is generally faster but more abundant,
116+
whereas register file space is limited.
117+
This is because the allocation here refers to the register files for an entire thread block.
118+
``TileLang`` uses a Layout Inference Pass during compilation to derive a Layout object ``T.Fragment``,
119+
which determines how to allocate the corresponding register files for each thread.
120+
This process will be discussed in detail in subsequent sections.
121+
122+
Data transfer between global memory and hardware-specific memory can be managed using ``T.copy``.
123+
Furthermore, hardware-specific buffers can be initialized using ``T.clear`` or ``T.fill``.
124+
For data assignments, operations can also be performed in parallel using ``T.Parallel``,
125+
as demonstrated in Layout Inference Pass in the following sections.
126+
127+
128+
.. _fig-layout_inference:
129+
130+
.. figure:: ../_static/img/LayoutInference.png
131+
:align: center
132+
:width: 100%
133+
:alt: GEMM with Multi-Level Tiling on GPUs

_sources/index.rst.txt

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
👋 Welcome to Tile Language
2+
===========================
3+
4+
`GitHub <https://github.com/tile-ai/tilelang>`_
5+
6+
Tile Language (tile-lang) is a concise domain-specific language designed to streamline
7+
the development of high-performance GPU/CPU kernels (e.g., GEMM, Dequant GEMM, FlashAttention, LinearAttention).
8+
By employing a Pythonic syntax with an underlying compiler infrastructure on top of TVM,
9+
tile-lang allows developers to focus on productivity without sacrificing the
10+
low-level optimizations necessary for state-of-the-art performance.
11+
12+
.. toctree::
13+
:maxdepth: 2
14+
:caption: GET STARTED
15+
16+
get_started/Installation.rst
17+
get_started/overview.rst
18+
19+
.. toctree::
20+
:maxdepth: 2
21+
:caption: LANGUAGE REFERENCE
22+
23+
language_ref/ast.rst
24+
language_ref/primitives.rst
25+
language_ref/tilelibrary.rst
26+
27+
28+
29+
.. toctree::
30+
:maxdepth: 1
31+
:caption: Privacy
32+
33+
privacy.rst

_sources/language_ref/ast.rst.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Tile Language AST
2+
==================
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Tile Language: Primitives
2+
=========================

0 commit comments

Comments
 (0)