What is the min GPU memory required to fine-tune the model?

First of all, thank you very much for your work.

I try to train the model `Gemma-2B 32K seq len with 2K segment size` on a single A6000Ada 48G
But even if I adjust the parameters in `train.gemma.infini.noclm.sh` like the following, it still shows that the GPU memory is exceeded.
Is this normal?

```
accelerate launch --mixed_precision='bf16' \
    train.gemma.infini.noclm.py \
    --model_name_or_path='google/gemma-2b' \
    --segment_length=2048 \
    --block_size=32768 \
    --dataset_name='wikitext' \
    --dataset_config_name='wikitext-2-raw-v1' \
    --per_device_train_batch_size=1 \
    --per_device_eval_batch_size=1 \
    --weight_decay=1.0 \
    --output_dir='./models/gemma-2b-infini-noclm-wikitext' \
    --checkpointing_steps=10 \
    --num_train_epochs=1 \
    --learning_rate=5e-5 \
    --seed=42 \
    --low_cpu_mem_usage \
    --report_to='wandb' \
    --preprocessing_num_workers=64 \
    --with_tracking \
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the min GPU memory required to fine-tune the model? #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What is the min GPU memory required to fine-tune the model? #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions