This project aims to compile OpenAI gpt-oss model using Apache TVM and run it on the target device.
Visit Wiki Home or Design Philosophy page to read more for the project goal and objectives!
To support gpt-oss correctly, TVM & MLC LLM needs to be built with a few patches.
Please refer to our Wiki - Setup & Run page for setup instructions.
Note
While TVM supports multiple hardware backends, this project has been mainly tested with the metal target on macOS. As the model uses the original mxfp4 and bfloat16 weights without further quantization, an Apple Silicon Mac with 24 GB or more of unified memory is recommended.
pip install huggingface_hub # to use `hf` command
hf download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/Important
To ensure equivalence with gpt-oss, please confirm that a TVM built with the patches applied.
You can install the desired TVM & MLC LLM by referring to the Wiki Setup page.
python run_gpt_oss.pypython chat.pyThe target device can be changed by modifying the following line in the scripts:
- engine = Engine(model_path, target="metal")
+ engine = Engine(model_path, target="<YOUR DEVICE TYPE>")Supported device types are determined by TVM target support.
This project follows the Apache License 2.0, in line with the licenses of gpt-oss and TVM.
- @Liberatedwinner
- @grf53
- @jhlee525
- @khj809