-
Notifications
You must be signed in to change notification settings - Fork 526
Open
Description
I use multiple A6000 cards for pretraining. The RAM of each card is 49140MiB.
I tried to pretrain LLaMA-7B with bf16-mixed,
batch_size = 60 # 125
micro_batch_size = 1 # 1 × 4 = 4 for each iterations
it works well before the backpropagation. Before backpropagation, it takes 47+/48G. But it's OOM when it reach the 15th step (When backpropagation is operated).
It's a way to make this work? I can come up with the following ideas, both of which can work. But I don't think they are the best choice.
- change the precision from
bf16-mixedtobf16-true. But as BLOOM) said, bfloat16 mixed precision training can solve the instability problem - Reduce the context length (Block size)
Metadata
Metadata
Assignees
Labels
No labels