GPU Memory Requirement on Training MELO

Hello. I just read your paper.
In the paper, it is mentioned that the extra parameters are only needed ~0.2% (0.12M) of the original model (T5 small: 60M) when inferencing, but I didn't find anything about memory usage when training MELO.

Is it possible to get a rough idea of how much GPU Memory resources are required when training MELO? 
Or if I'm misunderstanding the paper, please let me know.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Memory Requirement on Training MELO #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU Memory Requirement on Training MELO #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions