Separation of concerns #13450
Replies: 1 comment
-
Hi @jacanchaplais thank you for sharing your concerns here!
Was there any pain point you had in particular?
PyTorch Lightning provides a variety of hooks you can override in For this reason, I think PyTorch Lightning already makes code modular and cohesive. Or, do you mean something else by "keeping code in modular, cohesive chunks"?
Yes, definitely. Related to #8648 (comment). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone, I hope this post isn't too general. I'm going to keep it short and won't include code examples off the bat, I just want to see what the developers and others thoughts are, and if I am misunderstanding.
I used Lightning a while back to make multi-GPU training easy with my graph neural network models. However, one thing I didn't really like was that it felt like monolithic objects with low cohesion were strongly encouraged. By this I mean, one class would define each layer, the code blocks in training, validation, and test loops, contain the metric trackers, and update them, etc.
On the one hand, I get that collating all of this functionality is what leads to the API being able to implicitly synchronise everything in parallel across all of the GPUs and workers, but on the other hand I found it much harder to manage my code base as it grew in complexity.
Are there any known best practices for keeping code in modular, cohesive chunks, while also being able to leverage the benefits of this library? Can I simply separate the bulk of the attributes included in Lightning Modules out into separate Python modules, then instantiate them in the main model, while preserving all of the synchronisation (ie. could I define a
torch.nn.Module
outside of aLightningModule
without needing to specify which device everything is on, as long as I instantiate the actual object within theLightningModule
class)?I am sorry if these questions sound basic or vague. I am fairly junior when it comes to NNs, and I have abandoned Lightning to use PyTorch's multiprocessing module because of these issues, but I'd really like to make use of the great stuff in here.
Let me know if you want a more systematic and concrete breakdown of what I'm asking, or examples, etc.
Beta Was this translation helpful? Give feedback.
All reactions