Skip to content

thustorage/nki-llama-contest

Optimized NKI-Llama - 1st Place Winner, ASPLOS/EuroSys 2025 OPTNKI Competition

License

This repository contains the source code for our winning entry (1st Place) in the ASPLOS/EuroSys 2025 Programming Competition.

Our work significantly enhances the performance of the baseline Llama inference system provided for the competition, originally based on the AWS nki-llama codebase.

Contest Website: https://github.com/asplos-contest/2025/blob/main/OPTNKI.md

Overview

The ASPLOS/EuroSys 2025 competition presented a unique challenge: optimizing large language model inference on specialized AI hardware using low-level programming interfaces. Specifically, teams were tasked with implementing the Llama3.2 1B model targeting a single AWS Trainium1 (trn1) chip.

Our Modifications

We introduced several key changes to the original nki-llama codebase including:

  • GEMM/GEMV Tiling
  • Instruction Fusion:
  • Kernel Fusion

Results

See Contest Website for our results

Getting Started

Prerequisites

  • See requirements.txt for details. pip install -r requirements.txt

Running the Code

python3 main.py --mode evaluate_all --enable-nki --seq-len 640

Acknowledgments

We thank the competition organizers and AWS for their generous sponsorship of computational resources, which enabled us to perform optimization on the NKI framework.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Cite us

If you find our work useful, please cite us:

@misc{nki-llama-contest,
   author = {{Shiwei Gao, Ruwen Fan, Shaoxun Zeng, Haodi Jiang, Huajun Bai, Yitian Yang, Hao Guo, Qing Wang, Jiwu Shu, Youyou Lu}},
   title = {{Optimized NKI-Llama}},
   url = {https://github.com/thustorage/nki-llama-contest},
   year = {2025}
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages