Skip to content

Conversation

@sarpit2907
Copy link

This PR addresses #1220 by reducing eager imports during 'import datajoint'.

Problem

On macOS, importing DataJoint was slow (~13s), primarily due to eager imports of:

  • diagram
  • table (which pulls in pandas)
  • S3/minio stack

Solution

  • Introduce lazy-loading via __getattr__ in datajoint/__init__.py
  • Preserve public API (Diagram, Table, aliases, etc.)
  • Heavy modules are imported only when accessed

Results (macOS, Python 3.11)

  • Before: ~13.1s
  • After: ~2.2s

Remaining import cost is dominated by core components (connection, numpy), which are expected.

Notes

This change does not alter behavior and only affects import-time performance.

@github-actions github-actions bot added the enhancement Indicates new improvements label Jan 9, 2026
Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dimitri-yatsenko
Copy link
Member

Hi @sarpit2907, thank you for working on this optimization!

We already have PR #1321 open that addresses the same issue (#1220). The approaches differ slightly:

Your approach (PR #1325):

  • Defers many imports in __init__.py
  • Moves some imports to function-level within modules

PR #1321 approach:

  • Uses __getattr__ in __init__.py for true lazy loading
  • Specifically targets the heaviest imports: Diagram, kill, and cli
  • Caches imports in globals() to avoid repeated lookups
  • Preserves the module's existing import structure

The __getattr__ approach is more targeted—it only defers loading when specific symbols are accessed, while keeping everything else unchanged. This minimizes the risk of subtle import-order bugs.

Since #1321 is already in review and takes a more conservative approach, we'll likely proceed with that one. However, your contribution is appreciated and the timing analysis you provided is valuable!

Feel free to review #1321 and provide feedback if you see improvements we could incorporate.

@sarpit2907
Copy link
Author

Thanks for the detailed feedback @dimitri-yatsenko — that makes sense.

I agree that the getattr-based approach in #1321 is more conservative and better aligned with the existing import structure.

I’m glad the profiling and timing analysis was useful. I’ll take a closer look at #1321 and share feedback or suggestions there if I spot anything that could be improved.

Thanks again for the review!

@dimitri-yatsenko
Copy link
Member

Thank you for the contribution! We've decided to go in a different direction for this feature. We appreciate your effort and time spent on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Indicates new improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants