Skip to content

Reduce the import time of pytorch_lightningΒ #12786

@awaelchli

Description

@awaelchli

Proposed refactor

The current import time for the pytorch_lightning package on my machine is several seconds. There are some opportunities to improve this.

Motivation

High import times have an impact on the development and debugging speed.

Benchmark

I benchmarked the import time in two environments:

  1. Fresh environment with pytorch lightning installed, no extras.
  2. My currrent environment, with many extras installed such as loggers, horovod, etc.

To measure the import time, I created a simple file which only imports pytorch_lightning:

import pytorch_lightning as pl

Then I use the importtime package to measure the time and create a profile:

python -X importtime simple.py 2> import.log

Finally, I used tuna to visualize the profile:

tuna import.log

For the fresh environment, the total import time is <2 secs with the following profile:

image

>  pip freeze  | grep torch                                            (clean-pl-env) 

pytorch-lightning==1.6.1
torch==1.11.0
torchmetrics==0.7.3

For a full development environment, the total import time is >4 seconds:

image

The times vary a bit between multiple runs. However, I have observed that the time is consistently higher when running in an environment with extras installed. Looking at the profiles, it looks like the origin of a large waste of time is coming from our

pytorch_lightning.utilities.imports module where we evaluate some constants at import time:

https://github.com/PyTorchLightning/pytorch-lightning/blob/ae3226ced96e2bc7e62f298d532aaf2290e6ef34/pytorch_lightning/utilities/imports.py#L98-L124

It looks like if a 3rd party package is installed and takes a long time to import, this time gets added to our loading time as well, even if the package never ends up being used. This is because our _module_available and _package_available implementations attempt to import the modules to check their availability. This can be very costly.

Pitch

Evaluate the import checks lazily.
Convert

_X_AVAILALBE = _module_available("x")

to

@lru_cache()
def _is_x_available() -> bool:
    return _module_available("x")

And investigate other opportunities to improve loading time given the above profile.

Additional context


If you enjoy Lightning, check out our other projects! ⚑

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @rohitgr7 @Borda @akihironitta

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions