Example project for fine-tuning, running inference, and evaluating LLMs using TorchTune on UW's Tillicum.
Getting Started
·
Usage
·
Additional Resources
This project uses TorchTune to demonstrate how to fine-tune LLMs with LoRA on Tillicum's multi-GPU nodes.
- Hugging Face account with Llama 3.2 access.
- SSH access to Tillicum.
-
SSH into Tillicum and connect to a GPU instance:
salloc --gpus=1
-
Clone the repository:
git clone https://github.com/josecols/ft-llms-tillicum.git ft-llms cd ft-llms -
Set up the project path environment variable:
Add the following line to your
~/.bashrcfile (replace the path with your actual project location):export FT_LLMS_ROOT=/gpfs/scrubbed/<netid>/projects/ft-llms
Then reload your bash configuration:
source ~/.bashrc
-
Enable the conda module:
module load conda
-
Create and activate conda environment:
conda create -n ft-llms python=3.12 conda activate ft-llms
-
Install TorchTune dependencies:
pip install torch torchvision torchao
-
Install TorchTune:
pip install torchtune
-
Install WanDB to track fine-tuning jobs:
pip install wandb
Note: You will need to authenticate your WanDB account for the first time. Additionally, modify the
metric_loggersection of the configuration fileconfigs/llama3_2_3B_cuda.yamlwith your WanDB project details. -
Install HuggingFace libraries to run inference tasks:
pip install transformers peft accelerate
-
Install the ROUGE score package to run evaluation tasks:
pip install rouge-score
-
Download the NLTK data (required by the ROUGE package):
python -c "import nltk; nltk.download('punkt_tab')"
This workshop demo uses the PLOS dataset from BioLaySumm 2025 to fine-tune a model for lay summarization of biomedical articles.
To download and prepare the dataset for training, run the following command:
python scripts/prepare_dataset.py --use-abstractsNote
You can omit the --use-abstracts flag if you prefer to train with the full article texts as input. However, you might need to adjust the training configuration to prevent out-of-memory errors.
The demo uses the Llama 3.2 3B model, but you can choose a different model if you prefer. To see all the supported model configurations, run tune ls.
tune download meta-llama/Llama-3.2-3B-Instruct --output-dir models/Llama-3.2-3B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <your-token>Note
Make sure to replace <your-token> with your HuggingFace token. You can also set it via the HF_TOKEN environment variable
Run single-node fine-tuning on 8 GPUs:
sbatch tasks/train_8_gpus.slurmThere is also a multi-node example script (tasks/train_16_gpus.slurm) that you can adapt for various distributed setups.
To check the job's progress, use the squeue -u <netid> command.
Run the following command to generate the summaries with the fine-tuned model:
sbatch tasks/inference.slurmRun the following command to evaluate the model summaries against the gold-standard:
sbatch tasks/eval.slurm