This repository contains the source code for our winning entry (1st Place) in the ASPLOS/EuroSys 2025 Programming Competition.
Our work significantly enhances the performance of the baseline Llama inference system provided for the competition, originally based on the AWS nki-llama codebase.
Contest Website: https://github.com/asplos-contest/2025/blob/main/OPTNKI.md
The ASPLOS/EuroSys 2025 competition presented a unique challenge: optimizing large language model inference on specialized AI hardware using low-level programming interfaces. Specifically, teams were tasked with implementing the Llama3.2 1B model targeting a single AWS Trainium1 (trn1) chip.
We introduced several key changes to the original nki-llama codebase including:
- GEMM/GEMV Tiling
- Instruction Fusion:
- Kernel Fusion
See Contest Website for our results
- See requirements.txt for details.
pip install -r requirements.txt
python3 main.py --mode evaluate_all --enable-nki --seq-len 640We thank the competition organizers and AWS for their generous sponsorship of computational resources, which enabled us to perform optimization on the NKI framework.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you find our work useful, please cite us:
@misc{nki-llama-contest,
author = {{Shiwei Gao, Ruwen Fan, Shaoxun Zeng, Haodi Jiang, Huajun Bai, Yitian Yang, Hao Guo, Qing Wang, Jiwu Shu, Youyou Lu}},
title = {{Optimized NKI-Llama}},
url = {https://github.com/thustorage/nki-llama-contest},
year = {2025}
}