Skip to content

Setup & Run

Park Woorak edited this page Jan 27, 2026 · 3 revisions

Install dependencies

TVM & MLC-LLM

Update submodule

git submodule update --init --recursive

Apply patches

cd 3rdparty/mlc-llm/3rdparty/tvm
git apply ../../../../tvm_fix.patch
cd -

Build from source

Follow the official documentations below to build TVM & MLC-LLM.
(Cloning repository is already done through the git submodule command above.)

  • [TVM] Install from source

    You will perform the tasks similar to this (in 3rdparty/mlc-llm/3rdparty/tvm):

    mkdir build && cp cmake/config.cmake build/ && cd build
    # Now, edit build/config.cmake refer to the document
    cmake .. && cmake --build . --parallel $(nproc)
  • [MLC-LLM] Build from source

    You will perform the tasks similar to this (in 3rdparty/mlc-llm):

    mkdir build && cd build
    python ../cmake/gen_cmake_config.py      # Answer this script to generate configuration
    export CMAKE_POLICY_VERSION_MINIMUM=3.5  # Recommended to avoid the CMake error on `tokenizer-cpp`
    cmake .. && cmake --build . --parallel $(nproc)

Install Python bindings

TVM FFI

# Install from source (recommended)
cd 3rdparty/mlc-llm/3rdparty/tvm/3rdparty/tvm-ffi
pip install -e .
cd -

# ... Or just use PyPI version
pip install apache-tvm-ffi<=0.1.7

TVM

cd 3rdparty/mlc-llm/3rdparty/tvm/python
pip install -e .
cd -

# Extra dependencies for tvm
# c.f. https://tvm.apache.org/docs/install/from_source.html#step-5-extra-python-dependencies
pip install psutil

MLC LLM

cd 3rdparty/mlc-llm/python
# Make sure that `flashinfer-python` is handled. (c.f. 'Important' admonition below)
pip install -e .
cd -

Important

Handling flashinfer-python in 3rdparty/mlc-llm/python/requirements.txt

  • exclude installation on unsupported platforms (e.g. macOS)
  • use version constraint >=0.5.0 for better dependency resolution

Python dependencies

pip install -r requirements.txt

Download model

Files for gpt-oss reference torch implementation

Note

While TVM supports multiple hardware backends, this project has been mainly tested with the metal target on macOS. As the model uses the original mxfp4 and bfloat16 weights without further quantization, an Apple Silicon Mac with 24 GB or more of unified memory is recommended.

pip install huggingface_hub  # to use `hf` command
hf download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/

Compile & Run

Basic single-turn test

Run the simplest gpt-oss test

python run_gpt_oss.py

Multi-turn chat

Run a simple multi-turn chat example

python chat.py

Getting Started

1. Architectural Implementations

2. Low-Level Optimization

Clone this wiki locally