This repository contains the code for the paper “GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration”. The project is based on the open-source repository "mLoRA-0.3.2". GMoE is a new MoE architecture with a Graph-based router, Poisson distribution-based distinction strategy and Normal distribution-based balance strategy.
-
config: Including the configurations of training or evaluating
-
gmoe/backends: Some backend tools for GMoE.
-
gmoe/common: The implementation of Transformer architecture.
-
gmoe/models: The implementation of some series of Transformer-based models.
-
gmoe/tasks: The implementation of datasets.
-
GMoE.py The start file of this project.
- python=3.11, pytorch>=2.1.2, pyg
- Other dependencies, See
bash requirements.txt
Configure the configs at folderbash config
. We have already given the config of GMoE and other four baseline models: LoRAMoE, MingMoE, MoLA and MixLoRA.
Replace the [base model] and the [train/evaluate config] below with the directory of base model and the configuration in Folder "config".
python GMoE.py --base_model [base model] --config [train config] --seed 42 --log_file GMoE.log --bf16 --overwrite
After training process, we can conduct the evaluation step with the command below:
python GMoE.py --base_model [base model] --config [train config] --seed 42 --log_file GMoE.log --bf16 --overwrite --evaluate
Note: Do not change the information in the train config after training step, or it won't find the right adapter.
If you use this project in your research, please cite the following paper:
@misc{bai2025gmoeempoweringllmsfinetuning,
title={GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration},
author={Ting Bai and Yue Yu and Le Huang and Zenan Xu and Zhe Zhao and Chuan Shi},
year={2025},
eprint={2412.16216},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.16216},
}