Training >60 times slower after converting to lightning. Am I doing something wrong? #7743
Replies: 1 comment 1 reply
-
Hi, Your timing probably isn't correct. You need to take care of synchronising the GPU before each timing since otherwise python code reaches the timing state where the GPU hasn't even executed but just queued the operation. We benchmark the results ourselves in our CI and we don't observe such a slowdown. There are a few things you could improve (like the number of workers in your loader or not reinstantiating the loss all the time) but they are only minor and shouldn't impact the performance that drastically. Have a look at https://discuss.pytorch.org/t/how-to-measure-time-in-pytorch/26964/2 for how to time correctly :) Also: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Lightning looks like it would be a good fit for my research. But upon converting a basic linear regression example from pytorch to lightning I'm seeing a dramatic reduction in performance.
I'm new to ML so I hope it's something that I'm doing wrong.
Here is the code example:
I though that a lot of the slowdown could come from DataLoader, which it does. Un-commenting the line:
# for batch_index, (inputs, labels) in enumerate(train_loader):
Changes to the time taken to 1.66, >30 times slower than the original.
I'm aware that I can add more workers, but that's not the difference in performance from the original. Maybe I have something wrong with the original or the way I've converted to lightning? Maybe the small dataset I'm working with is a factor?
Any guidance would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions