Is it possible to further reduce the RAM? 

I use multiple A6000 cards for pretraining. The RAM of each card is 49140MiB.

I tried to pretrain LLaMA-7B with `bf16-mixed`, 
```
batch_size = 60 # 125
micro_batch_size = 1 # 1 × 4 = 4 for each iterations
```


it works well before the backpropagation. Before backpropagation, it takes 47+/48G. But it's OOM when it reach the 15th step (When backpropagation is operated). 

It's a way to make this work? I can come up with the following ideas, both of which can work. But I don't think they are the best choice.
1. change the precision from `bf16-mixed` to `bf16-true`. But as [BLOOM](https://arxiv.org/abs/2211.05100#:~:text=BLOOM%20is%20a%20decoder-only%20Transformer%20language%20model%20that,natural%20and%2013%20programming%20languages%20%2859%20in%20total%29.)) said, bfloat16 mixed precision training can solve the instability problem
2. Reduce the context length (Block size)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it possible to further reduce the RAM? #395

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is it possible to further reduce the RAM? #395

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions