-
Notifications
You must be signed in to change notification settings - Fork 1
Setup & Run
Park Woorak edited this page Jan 27, 2026
·
3 revisions
git submodule update --init --recursivecd 3rdparty/mlc-llm/3rdparty/tvm
git apply ../../../../tvm_fix.patch
cd -Follow the official documentations below to build TVM & MLC-LLM.
(Cloning repository is already done through the git submodule command above.)
-
You will perform the tasks similar to this (in
3rdparty/mlc-llm/3rdparty/tvm):mkdir build && cp cmake/config.cmake build/ && cd build # Now, edit build/config.cmake refer to the document cmake .. && cmake --build . --parallel $(nproc)
-
You will perform the tasks similar to this (in
3rdparty/mlc-llm):mkdir build && cd build python ../cmake/gen_cmake_config.py # Answer this script to generate configuration export CMAKE_POLICY_VERSION_MINIMUM=3.5 # Recommended to avoid the CMake error on `tokenizer-cpp` cmake .. && cmake --build . --parallel $(nproc)
# Install from source (recommended)
cd 3rdparty/mlc-llm/3rdparty/tvm/3rdparty/tvm-ffi
pip install -e .
cd -
# ... Or just use PyPI version
pip install apache-tvm-ffi<=0.1.7cd 3rdparty/mlc-llm/3rdparty/tvm/python
pip install -e .
cd -
# Extra dependencies for tvm
# c.f. https://tvm.apache.org/docs/install/from_source.html#step-5-extra-python-dependencies
pip install psutilcd 3rdparty/mlc-llm/python
# Make sure that `flashinfer-python` is handled. (c.f. 'Important' admonition below)
pip install -e .
cd -Important
Handling flashinfer-python in 3rdparty/mlc-llm/python/requirements.txt
- exclude installation on unsupported platforms (e.g. macOS)
- use version constraint
>=0.5.0for better dependency resolution
pip install -r requirements.txtNote
While TVM supports multiple hardware backends, this project has been mainly tested with the metal target on macOS. As the model uses the original mxfp4 and bfloat16 weights without further quantization, an Apple Silicon Mac with 24 GB or more of unified memory is recommended.
pip install huggingface_hub # to use `hf` command
hf download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/Run the simplest gpt-oss test
python run_gpt_oss.pyRun a simple multi-turn chat example
python chat.py- gpt-oss-tvm
- gpt-oss
- Model Card
- Blog post
- GitHub
- [Huggingface] gpt-oss-20b
- [Huggingface] gpt-oss-120b
- TVM
- MLC LLM
-
Attention & Sliding Window
- Computing attentions in TVM
- Sink Token Workaround
-
Mixture-of-Experts (MoE)
- TIR-based MoE Einsum
- Gating Network Implementation
- Comparison with Standard TVM Approaches
-
RoPE with YaRN
- What is YaRN?
- Limitations in Existing TVM Implementations
- Our Improvements
-
TIR-based support for MXFP4
- What is MXFP4?
- MXFP4 TIR Implementation
- Operator Fusion