-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Proposed refactor
The current import time for the pytorch_lightning package on my machine is several seconds. There are some opportunities to improve this.
Motivation
High import times have an impact on the development and debugging speed.
Benchmark
I benchmarked the import time in two environments:
- Fresh environment with pytorch lightning installed, no extras.
- My currrent environment, with many extras installed such as loggers, horovod, etc.
To measure the import time, I created a simple file which only imports pytorch_lightning:
import pytorch_lightning as pl
Then I use the importtime
package to measure the time and create a profile:
python -X importtime simple.py 2> import.log
Finally, I used tuna to visualize the profile:
tuna import.log
For the fresh environment, the total import time is <2 secs with the following profile:
> pip freeze | grep torch (clean-pl-env)
pytorch-lightning==1.6.1
torch==1.11.0
torchmetrics==0.7.3
For a full development environment, the total import time is >4 seconds:
The times vary a bit between multiple runs. However, I have observed that the time is consistently higher when running in an environment with extras installed. Looking at the profiles, it looks like the origin of a large waste of time is coming from our
pytorch_lightning.utilities.imports
module where we evaluate some constants at import time:
It looks like if a 3rd party package is installed and takes a long time to import, this time gets added to our loading time as well, even if the package never ends up being used. This is because our _module_available
and _package_available
implementations attempt to import the modules to check their availability. This can be very costly.
Pitch
Evaluate the import checks lazily.
Convert
_X_AVAILALBE = _module_available("x")
to
@lru_cache()
def _is_x_available() -> bool:
return _module_available("x")
And investigate other opportunities to improve loading time given the above profile.
Additional context
If you enjoy Lightning, check out our other projects! β‘
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.