Distributed transformations of the same tensor in every step - how to translate into Lightning? #8349
Unanswered
tomaszpietruszka-globality
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am struggling to translate a model that's very simple in pure pytorch into Lightning - I would really appreciate any advice on what would be the best approach here.
In every training step, I:
It seems quite simple and it's easy to implement with
DataParallel
. The key thing is: there is a large amount of data points to encode in point 2. They have to be split into parts, each part encoded on a different GPU - otherwise I will definitely run into CUDA out of memory.Now, if I:
register_buffer
on the auxiliary data and then callself.encoder(self.auxiliary_data)
- looks like every GPU is storing and encoding all of the items (duplication of processing and memory use -> not viable)self.encoder(auxiliary_data)
- I get a device mismatch, whetherauxiliary_data
is on CPU or GPU.What would be the
lightning
way of handling such a case?FWIW, the way I implemented it in
DataParallel
pytorch is: I just haveself.encoder=DataParallel(BertModel(...))
. This way whenever I callself.encoder(some_input)
it gets scattered across devices and then brought back to the first oneBeta Was this translation helpful? Give feedback.
All reactions