Hello. I just read your paper.
In the paper, it is mentioned that the extra parameters are only needed ~0.2% (0.12M) of the original model (T5 small: 60M) when inferencing, but I didn't find anything about memory usage when training MELO.
Is it possible to get a rough idea of how much GPU Memory resources are required when training MELO?
Or if I'm misunderstanding the paper, please let me know.
Thanks.