SFTTrainer training very slow on GPU. Is this training speed expected? #2378
Unanswered
pledominykas
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am currently trying to perform full fine tuning on the ai-forever/mGPT model (1.3B parameters) using a single A100 GPU (40GB VRAM) on Google Colab. However when running the training is very slow: ~0.06 it/s.
Here is my code:
`
dataset = load_dataset("allenai/c4", "lt")
train_dataset = dataset["train"]
eval_dataset = dataset["validation"]
train_dataset = train_dataset.take(10000)
eval_dataset = eval_dataset.take(1000)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = train_dataset,
eval_dataset = eval_dataset,
dataset_text_field = "text",
max_seq_length = 2048,
)
trainer_stats = trainer.train()
`
And the trainer output:

It says it will take ~10hrs to process 10k examples from the c4 dataset.
These are the relevant package versions and a screenshot of GPU usage:
`
Package Version
accelerate 0.34.2

bitsandbytes 0.44.1
datasets 3.1.0
peft 0.13.2
torch 2.5.0+cu121
trl 0.12.0
`
It does seem to load the model to the GPU, but for some reason it’s still very slow.
I tried to use keep_in_memory=True when loading the dataset, but it did not help.
I also tried pre-tokenizing the dataset and using Trainer instead of SFTTrainer but the performance was similar.
I was wondering whether this is the expected training speed or is there some issue with my code? And if it is an issue, what could a possible fix be?
Beta Was this translation helpful? Give feedback.
All reactions