Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions peps/pep-0810.rst
Original file line number Diff line number Diff line change
Expand Up @@ -734,6 +734,132 @@ Subinterpreters are supported. Each subinterpreter maintains its own
``sys.lazy_modules`` and import state, so lazy imports in one subinterpreter
do not affect others.

Performance
-----------

Lazy imports have **no measurable performance overhead**. The implementation
is designed to be performance-neutral for both code that uses lazy imports and
code that doesn't.

Runtime performance
~~~~~~~~~~~~~~~~~~~

After reification (first use), lazy imports have **zero overhead**. The
adaptive interpreter specializes the bytecode (typically after 2-3 accesses),
eliminating any checks. For example, ``LOAD_GLOBAL`` becomes
``LOAD_GLOBAL_MODULE``, which directly accesses the module identically to
normal imports.

The `pyperformance suite`_ confirms the implementation is performance-neutral.

.. _pyperformance suite: https://github.com/facebookexperimental/
free-threading-benchmarking/blob/main/results/bm-20250922-3.15.0a0-27836e5/
bm-20250922-vultr-x86_64-DinoV-lazy_imports-3.15.0a0-27836e5-vs-base.svg

Filter function performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The filter function (set via ``sys.set_lazy_imports_filter()``) is called for
every *potentially lazy* import to determine whether it should actually be
lazy. When no filter is set, this is simply a NULL check (testing whether a
filter function has been registered), which is a highly predictable branch that
adds essentially no overhead. When a filter is installed, it is called for each
potentially lazy import, but this still has **almost no measurable performance
cost**. To measure this, we benchmarked importing all 278 top-level importable
modules from the Python standard library (which transitively loads 392 total
modules including all submodules and dependencies), then forced reification of
every loaded module to ensure everything was fully materialized.

Note that these measurements establish the baseline overhead of the filter
mechanism itself. Of course, any user-defined filter function that performs
additional work beyond a trivial check will add overhead proportional to the
complexity of that work. However, we expect that in practice this overhead
will be dwarfed by the performance benefits gained from avoiding unnecessary
imports. The benchmarks below measure the minimal cost of the filter dispatch
mechanism when the filter function does essentially nothing.

We compared four different configurations:

.. list-table::
:header-rows: 1
:widths: 50 25 25

* - Configuration
- Mean ± Std Dev (ms)
- Overhead vs Baseline
* - **Eager imports** (baseline)
- 161.2 ± 4.3
- 0%
* - **Lazy + filter forcing eager**
- 161.7 ± 4.2
- +0.3% ± 3.7%
* - **Lazy + filter allowing lazy + reification**
- 162.0 ± 4.0
- +0.5% ± 3.7%
* - **Lazy + no filter + reification**
- 161.4 ± 4.3
- +0.1% ± 3.8%

The four configurations:

1. **Eager imports (baseline)**: Normal Python imports with no lazy machinery.
Standard Python behavior.

2. **Lazy + filter forcing eager**: Filter function returns ``False`` for all
imports, forcing eager execution, then all imports are reified at script
end. Measures pure filter calling overhead since every import goes through
the filter but executes eagerly.

3. **Lazy + filter allowing lazy + reification**: Filter function returns
``True`` for all imports, allowing lazy execution. All imports are reified
at script end. Measures filter overhead when imports are actually lazy.

4. **Lazy + no filter + reification**: No filter installed, imports are lazy
and reified at script end. Baseline for lazy behavior without filter.

The benchmarks used `hyperfine <https://github.com/sharkdp/hyperfine>`_,
testing 278 standard library modules. Each ran in a fresh Python process.
All configurations force the import of exactly the same set of modules
(all modules loaded by the eager baseline) to ensure a fair comparison.

The benchmark environment used CPU isolation with 32 logical CPUs (0-15 at
3200 MHz, 16-31 at 2400 MHz), the performance scaling governor, Turbo Boost
disabled, and full ASLR randomization. The overhead error bars are computed
using standard error propagation for the formula ``(value - baseline) /
baseline``, accounting for uncertainties in both the measured value and the
baseline.

Startup time improvements
~~~~~~~~~~~~~~~~~~~~~~~~~~

The primary performance benefit of lazy imports is reduced startup time by
loading only the modules actually used at runtime, rather than optimistically
loading entire dependency trees at startup.

Real-world deployments at scale have demonstrated that the benefits can be
massive, though of course this depends on the specific codebase and usage
patterns. Organizations with large, interconnected codebases have reported
substantial reductions in server reload times, ML training initialization,
command-line tool startup, and Jupyter notebook loading. Memory usage
improvements have also been observed as unused modules remain unloaded.

For detailed case studies and performance data from production deployments,
see:

- `Python Lazy Imports With Cinder
<https://developers.facebook.com/blog/post/2022/06/15/python-lazy-imports-with-cinder/>`__
(Meta Instagram Server)
- `Lazy is the new fast: How Lazy Imports and Cinder accelerate machine
learning at Meta
<https://engineering.fb.com/2024/01/18/developer-tools/lazy-imports-cinder-machine-learning-meta/>`__
(Meta ML Workloads)
- `Inside HRT's Python Fork
<https://www.hudsonrivertrading.com/hrtbeat/inside-hrts-python-fork/>`__
(Hudson River Trading)

The benefits scale with codebase complexity: the larger and more
interconnected the codebase, the more dramatic the improvements.

Typing and tools
----------------

Expand Down