Distributed transformations of the same tensor in every step - how to translate into Lightning? #8349

tomaszpietruszka-globality · 2021-07-09T11:03:12Z

tomaszpietruszka-globality
Jul 9, 2021

I am struggling to translate a model that's very simple in pure pytorch into Lightning - I would really appreciate any advice on what would be the best approach here.

In every training step, I:

encode the current batch of training data using a Transformer encoder
encode some constant "auxiliary" data (multiple items) using the same Transformer (the data being encoded is the same in every step - but encodings change, due to encoder parameter changes and dropout)
calculate the loss based on outputs of both 1 and 2

It seems quite simple and it's easy to implement with DataParallel. The key thing is: there is a large amount of data points to encode in point 2. They have to be split into parts, each part encoded on a different GPU - otherwise I will definitely run into CUDA out of memory.

Now, if I:

use register_buffer on the auxiliary data and then call self.encoder(self.auxiliary_data) - looks like every GPU is storing and encoding all of the items (duplication of processing and memory use -> not viable)
if I just do self.encoder(auxiliary_data) - I get a device mismatch, whether auxiliary_data is on CPU or GPU.

What would be the lightning way of handling such a case?

FWIW, the way I implemented it in DataParallel pytorch is: I just have self.encoder=DataParallel(BertModel(...)). This way whenever I call self.encoder(some_input) it gets scattered across devices and then brought back to the first one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed transformations of the same tensor in every step - how to translate into Lightning? #8349

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Distributed transformations of the same tensor in every step - how to translate into Lightning? #8349

Uh oh!

Uh oh!

tomaszpietruszka-globality Jul 9, 2021

Replies: 0 comments

tomaszpietruszka-globality
Jul 9, 2021