-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
First of all, thank you very much for your work.
I try to train the model Gemma-2B 32K seq len with 2K segment size on a single A6000Ada 48G
But even if I adjust the parameters in train.gemma.infini.noclm.sh like the following, it still shows that the GPU memory is exceeded.
Is this normal?
accelerate launch --mixed_precision='bf16' \
train.gemma.infini.noclm.py \
--model_name_or_path='google/gemma-2b' \
--segment_length=2048 \
--block_size=32768 \
--dataset_name='wikitext' \
--dataset_config_name='wikitext-2-raw-v1' \
--per_device_train_batch_size=1 \
--per_device_eval_batch_size=1 \
--weight_decay=1.0 \
--output_dir='./models/gemma-2b-infini-noclm-wikitext' \
--checkpointing_steps=10 \
--num_train_epochs=1 \
--learning_rate=5e-5 \
--seed=42 \
--low_cpu_mem_usage \
--report_to='wandb' \
--preprocessing_num_workers=64 \
--with_tracking \
ellisbrown
Metadata
Metadata
Assignees
Labels
No labels