Skip to content

Is it possible to further reduce the RAM?  #395

@ForcewithMe66

Description

@ForcewithMe66

I use multiple A6000 cards for pretraining. The RAM of each card is 49140MiB.

I tried to pretrain LLaMA-7B with bf16-mixed,

batch_size = 60 # 125
micro_batch_size = 1 # 1 × 4 = 4 for each iterations

it works well before the backpropagation. Before backpropagation, it takes 47+/48G. But it's OOM when it reach the 15th step (When backpropagation is operated).

It's a way to make this work? I can come up with the following ideas, both of which can work. But I don't think they are the best choice.

  1. change the precision from bf16-mixed to bf16-true. But as BLOOM) said, bfloat16 mixed precision training can solve the instability problem
  2. Reduce the context length (Block size)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions