SFTTrainer training very slow on GPU. Is this training speed expected? #2378

pledominykas · 2024-11-21T17:50:20Z

pledominykas
Nov 21, 2024

I am currently trying to perform full fine tuning on the ai-forever/mGPT model (1.3B parameters) using a single A100 GPU (40GB VRAM) on Google Colab. However when running the training is very slow: ~0.06 it/s.

Here is my code:
`
dataset = load_dataset("allenai/c4", "lt")

train_dataset = dataset["train"]
eval_dataset = dataset["validation"]

train_dataset = train_dataset.take(10000)
eval_dataset = eval_dataset.take(1000)

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = train_dataset,
eval_dataset = eval_dataset,
dataset_text_field = "text",
max_seq_length = 2048,

args = TrainingArguments(
    gradient_accumulation_steps = 4,
    gradient_checkpointing = True,

    num_train_epochs = 3,
    learning_rate = 2e-4,
    per_device_train_batch_size = 4,
    per_device_eval_batch_size = 4,

    seed = 99,
    output_dir = "./checkpoints",

    save_strategy = "steps",
    eval_strategy = "steps",

    save_steps = 0.1,
    eval_steps = 0.1,
    logging_steps = 0.1,
    load_best_model_at_end = True
),

)

trainer_stats = trainer.train()
`

And the trainer output:

It says it will take ~10hrs to process 10k examples from the c4 dataset.

These are the relevant package versions and a screenshot of GPU usage:
`
Package Version

accelerate 0.34.2
bitsandbytes 0.44.1
datasets 3.1.0
peft 0.13.2
torch 2.5.0+cu121
trl 0.12.0
`

It does seem to load the model to the GPU, but for some reason it’s still very slow.

I tried to use keep_in_memory=True when loading the dataset, but it did not help.

I also tried pre-tokenizing the dataset and using Trainer instead of SFTTrainer but the performance was similar.

I was wondering whether this is the expected training speed or is there some issue with my code? And if it is an issue, what could a possible fix be?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SFTTrainer training very slow on GPU. Is this training speed expected? #2378

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

SFTTrainer training very slow on GPU. Is this training speed expected? #2378

Uh oh!

pledominykas Nov 21, 2024

Replies: 0 comments

pledominykas
Nov 21, 2024