Skip to content

Improve import time by lazy-loading cloud provider modules #544

@eh-steve

Description

@eh-steve

Title

Improve import time by lazy-loading cloud provider modules

Description

Problem

Importing cloudpathlib currently takes ~435ms because it eagerly loads all cloud provider SDKs (google-cloud-storage, boto3, azure-storage-blob) at import time, even if the user only needs one provider.

This is problematic for:

  • CLI tools where startup latency matters
  • Serverless functions where cold start time is critical
  • Applications that only use one cloud provider but pay the import cost for all three

Benchmarks

Measured with Python 3.12 (5 runs, mean ± std):

Scenario Current With Lazy Loading Improvement
import cloudpathlib 435ms ± 24ms 22ms ± 1ms 95% faster
from cloudpathlib import S3Path 410ms ± 3ms 23ms ± 0ms 94% faster
from cloudpathlib import GSPath 410ms ± 3ms 24ms ± 1ms 94% faster
from cloudpathlib import AzureBlobPath 413ms ± 7ms 41ms ± 1ms 90% faster

Additionally, the current implementation loads all three SDKs even when importing just one path class. With lazy loading, only the SDK for the provider being used gets loaded.

Proposed Solution

Implement lazy loading via __getattr__ in the __init__.py files:

  • cloudpathlib/__init__.py
  • cloudpathlib/s3/__init__.py
  • cloudpathlib/gs/__init__.py
  • cloudpathlib/azure/__init__.py

This defers loading cloud SDKs until they are actually accessed, while preserving static type hints via TYPE_CHECKING blocks.

Example

Before:
import cloudpathlib # Takes ~435ms, loads all SDKsAfter:
import cloudpathlib # Takes ~22ms, no SDKs loaded
from cloudpathlib import S3Path # Only loads boto3 when S3Path is accessed

Implementation Notes

  • Use __getattr__ with TYPE_CHECKING blocks for static type hints (IDE support preserved)
  • Fix absolute import in cloudpath.py (from cloudpathlib.enumsfrom .enums)
  • Move anypath import to function-local in cloudpath.py to avoid circular imports
  • Add tests to verify lazy loading behavior and prevent regressions

I have a working implementation ready to submit as a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions