A Python-based Domain-Specific Language (DSL) for authoring high-performance custom kernels on Tenstorrent hardware. This project is under active development — see the functionality matrix for current simulator and compiler support.
TT-Lang joins the Tenstorrent software ecosystem as an expressive yet ergonomic middle ground between TT-NN and TT-Metalium, aiming to provide a unified entrypoint with integrated simulation, performance analysis, and AI-assisted development tooling.
The language is designed to support generative AI workflows and a robust tooling ecosystem: Python as the host language enables AI tools to translate GPU DSL kernels (Triton, CUDA, cuTile, TileLang) to Tenstorrent hardware more reliably than direct TT-Metalium translation, while tight integration with functional simulation will allow AI agents to propose kernel implementations, validate correctness, and iterate on configurations autonomously. Developers should be able to catch errors and performance issues in their IDE rather than on hardware, with a functional simulator to surface bugs early. Line-by-line performance metrics and data flow graphs can guide both programmers and AI agents to easily spot bottle necks and optimization opportunities.
Tenstorrent developers today face a choice between TT-NN which provides high-level operations that are straightforward to use but lack the expressivity needed for custom kernels and TT-Metalium which provides full hardware control through explicit low-level management of memory and compute. This is not a shortcoming of TT-Metalium; it is designed to be low-level and expressive, providing direct access to hardware primitives without abstraction overhead, and it serves its purpose well for developers who need that level of control. The problem is that there is no middle ground where the compiler handles what it does best—resource management, validation, optimization—while maintaining high expressivity for application-level concerns.
TT-Lang bridges this gap through progressive disclosure: simple kernels require minimal specification where the compiler infers compute API operations, NOC addressing, DST register allocation and more from high-level abstractions, while complex kernels allow developers to open the hood and craft pipelining and synchronization details directly. The primary use case is kernel fusion for model deployment. Engineers porting models through TT-NN quickly encounter operations that need to be fused for performance or patterns that TT-NN cannot express, and today this requires rewriting in TT-Metalium which takes weeks and demands undivided attention and hardware debugging expertise. TT-Lang makes this transition fast and correct: a developer can take a sequence of TT-NN operations, express the fused equivalent with explicit control over intermediate results and memory layout, validate correctness through simulation, and integrate the result as a drop-in replacement in their TT-NN graph.
The fastest way to try tt-lang is with the functional simulator, which runs kernels as pure Python — no hardware, no compiler build required:
git clone https://github.com/tenstorrent/tt-lang.git
cd tt-lang
cmake -G Ninja -B build -DTTLANG_SIM_ONLY=ON
source build/env/activate
ttlang-sim examples/eltwise_add.pyTo compile and run kernels on Tenstorrent hardware, use a pre-built Docker image. Two images are available:
| Image | Purpose | Can run tt-lang programs? | Can clone/build tt-lang? |
|---|---|---|---|
| Run tt-lang programs | Yes | No | |
| Develop and build tt-lang from source | Yes | Yes |
Both images can be used with ird reserve (see container build docs for details).
Image: ghcr.io/tenstorrent/tt-lang/tt-lang-dist-ubuntu-22-04:latest (all versions)
The dist image contains a single, fully built tt-lang installation in /opt/ttlang-toolchain. Use it to compile and run any tt-lang program without building any of the prerequisites.
⚠️ Important: Do not attempt to build tt-lang inside a dist container — it has no build toolchain. To clone and build tt-lang yourself, use the ird image instead.
Create the container (one-time):
docker run -d --name $USER-dist \
--device=/dev/tenstorrent/0:/dev/tenstorrent/0 \
-v /dev/hugepages:/dev/hugepages \
-v /dev/hugepages-1G:/dev/hugepages-1G \
-v $HOME:$HOME \
ghcr.io/tenstorrent/tt-lang/tt-lang-dist-ubuntu-22-04:latest \
sleep infinityOpen a shell:
docker exec -it $USER-dist /bin/bashThe environment activates automatically on login. Run an example immediately:
python /opt/ttlang-toolchain/examples/tutorial/multicore_grid_auto.pyTo learn more, work through the tutorial, explore the programming guide for compiler options, debugging, and performance tools, or use Claude Code with the built-in slash commands to translate kernels, profile, and optimize.
Image: ghcr.io/tenstorrent/tt-lang/tt-lang-ird-ubuntu-22-04:latest (all versions)
The ird image has the pre-built toolchain (LLVM, tt-metal, Python venv) but does not include tt-lang itself. Clone the repository and build against the toolchain. You can maintain multiple clones or branches side by side, each with its own build directory.
To use directly with docker on your local linux machine, first create a container (one-time):
docker run -d --name $USER-ird \
--device=/dev/tenstorrent/0:/dev/tenstorrent/0 \
-v /dev/hugepages:/dev/hugepages \
-v /dev/hugepages-1G:/dev/hugepages-1G \
-v $HOME:$HOME \
-v $SSH_AUTH_SOCK:/ssh-agent -e SSH_AUTH_SOCK=/ssh-agent \
ghcr.io/tenstorrent/tt-lang/tt-lang-ird-ubuntu-22-04:latest \
sleep infinityOpen a shell:
docker exec -it $USER-ird /bin/bashInside the container, clone and build:
git clone https://github.com/tenstorrent/tt-lang.git
cd tt-lang
cmake -G Ninja -B build -DTTLANG_USE_TOOLCHAIN=ON
source build/env/activate
cmake --build buildVerify the build:
ninja -C build check-ttlang-allRun an example:
python examples/tutorial/multicore_grid_auto.pyThe -DTTLANG_USE_TOOLCHAIN=ON flag tells CMake to use the pre-built LLVM and tt-metal from /opt/ttlang-toolchain instead of building them from source, which saves significant build time.
Performance tracing (Tracy) is enabled by default. To disable it, add -DTTLANG_ENABLE_PERF_TRACE=OFF to the cmake configure command. See the programming guide for profiling usage.
To build tt-lang directly on a host machine without Docker, see the build system documentation. It covers prerequisites, all supported build modes (from submodules, reusable toolchain, pre-built toolchain), and version compatibility.
To map a different TT device, change the --device argument (e.g., --device=/dev/tenstorrent/1:/dev/tenstorrent/0).
tt-lang includes a functional simulator that runs kernels as pure Python, without requiring Tenstorrent hardware or the full compiler stack. Use it to validate kernel logic and debug with any Python debugger:
ttlang-sim examples/eltwise_add.pyThe simulator typically supports more language features than the compiler at any given point — see the functionality matrix for current coverage. See the programming guide for debugger setup and details.
Full documentation is built with Sphinx. The source lives in docs/sphinx/ and covers:
- Tutorial — step-by-step examples from single-tile to multinode kernels
- Programming Guide — compiler options, print debugging, performance tools
- Functional Simulator — run kernels without hardware, debugging setup
- Claude Skills — AI-assisted kernel translation, profiling, and optimization via Claude Code
- Build System — build configuration, toolchain modes, and version compatibility
- Testing — how to write and run tests
- Contributor Guide — workflow, validation, adding new ops
To build and view the Sphinx docs locally:
cmake -G Ninja -B build -DTTLANG_ENABLE_DOCS=ON
cmake --build build --target ttlang-docs
python -m http.server 8000 -d build/docs/sphinx/_build/htmlWe welcome contributions. Please see CONTRIBUTING.md for guidelines.
See the Sphinx contributor guide and code style guidelines for coding standards, dialect design patterns, and testing practices.
tt-mlir defines the compatible versions of LLVM and tt-metal. When updating tt-mlir, the other submodules should be updated to match.
Update tt-mlir (and read the versions it expects):
cd third-party/tt-mlir && git fetch && git checkout <commit> && cd ../..
# Read the LLVM and tt-metal commits that this tt-mlir version expects:
grep LLVM_PROJECT_VERSION third-party/tt-mlir/env/CMakeLists.txt
grep TT_METAL_VERSION third-party/tt-mlir/third_party/CMakeLists.txtUpdate LLVM to the compatible version:
cd third-party/llvm-project && git fetch && git checkout <llvm-sha> && cd ../..Update tt-metal to the compatible version:
cd third-party/tt-metal && git fetch && git checkout <tt-metal-sha> && cd ../..Commit all submodule updates together:
git add third-party/tt-mlir third-party/llvm-project third-party/tt-metal
git commit -m "Update submodules to tt-mlir <commit>"The build system verifies SHA compatibility during configure. If submodule versions are intentionally mismatched, pass -DTTLANG_ACCEPT_LLVM_MISMATCH=ON or -DTTLANG_ACCEPT_TTMETAL_MISMATCH=ON to suppress the check.
tt-lang uses pre-commit to format code and enforce style guidelines before commits.
Install and activate:
pip install pre-commit
cd /path/to/tt-lang
pre-commit installPre-commit runs automatically on git commit. It formats Python code with Black, C++ code with clang-format (LLVM style), removes trailing whitespace, and checks YAML/TOML syntax.
If pre-commit modifies files, the commit is stopped. Stage the changes and commit again:
git add -u
git commit -m "Your commit message"To run manually on all files: pre-commit run --all-files
This project adheres to a Code of Conduct. By participating, you are expected to uphold this code and treat all community members with respect.
- GitHub Issues — report bugs or request features
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.
Third-party dependencies and their licenses are listed in the NOTICE file.
