Skip to content

Commit 3768b70

Browse files
authored
PEP 810: Add section about performance (#4633)
1 parent 516b7d2 commit 3768b70

File tree

1 file changed

+126
-0
lines changed

1 file changed

+126
-0
lines changed

peps/pep-0810.rst

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -734,6 +734,132 @@ Subinterpreters are supported. Each subinterpreter maintains its own
734734
``sys.lazy_modules`` and import state, so lazy imports in one subinterpreter
735735
do not affect others.
736736

737+
Performance
738+
-----------
739+
740+
Lazy imports have **no measurable performance overhead**. The implementation
741+
is designed to be performance-neutral for both code that uses lazy imports and
742+
code that doesn't.
743+
744+
Runtime performance
745+
~~~~~~~~~~~~~~~~~~~
746+
747+
After reification (first use), lazy imports have **zero overhead**. The
748+
adaptive interpreter specializes the bytecode (typically after 2-3 accesses),
749+
eliminating any checks. For example, ``LOAD_GLOBAL`` becomes
750+
``LOAD_GLOBAL_MODULE``, which directly accesses the module identically to
751+
normal imports.
752+
753+
The `pyperformance suite`_ confirms the implementation is performance-neutral.
754+
755+
.. _pyperformance suite: https://github.com/facebookexperimental/
756+
free-threading-benchmarking/blob/main/results/bm-20250922-3.15.0a0-27836e5/
757+
bm-20250922-vultr-x86_64-DinoV-lazy_imports-3.15.0a0-27836e5-vs-base.svg
758+
759+
Filter function performance
760+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
761+
762+
The filter function (set via ``sys.set_lazy_imports_filter()``) is called for
763+
every *potentially lazy* import to determine whether it should actually be
764+
lazy. When no filter is set, this is simply a NULL check (testing whether a
765+
filter function has been registered), which is a highly predictable branch that
766+
adds essentially no overhead. When a filter is installed, it is called for each
767+
potentially lazy import, but this still has **almost no measurable performance
768+
cost**. To measure this, we benchmarked importing all 278 top-level importable
769+
modules from the Python standard library (which transitively loads 392 total
770+
modules including all submodules and dependencies), then forced reification of
771+
every loaded module to ensure everything was fully materialized.
772+
773+
Note that these measurements establish the baseline overhead of the filter
774+
mechanism itself. Of course, any user-defined filter function that performs
775+
additional work beyond a trivial check will add overhead proportional to the
776+
complexity of that work. However, we expect that in practice this overhead
777+
will be dwarfed by the performance benefits gained from avoiding unnecessary
778+
imports. The benchmarks below measure the minimal cost of the filter dispatch
779+
mechanism when the filter function does essentially nothing.
780+
781+
We compared four different configurations:
782+
783+
.. list-table::
784+
:header-rows: 1
785+
:widths: 50 25 25
786+
787+
* - Configuration
788+
- Mean ± Std Dev (ms)
789+
- Overhead vs Baseline
790+
* - **Eager imports** (baseline)
791+
- 161.2 ± 4.3
792+
- 0%
793+
* - **Lazy + filter forcing eager**
794+
- 161.7 ± 4.2
795+
- +0.3% ± 3.7%
796+
* - **Lazy + filter allowing lazy + reification**
797+
- 162.0 ± 4.0
798+
- +0.5% ± 3.7%
799+
* - **Lazy + no filter + reification**
800+
- 161.4 ± 4.3
801+
- +0.1% ± 3.8%
802+
803+
The four configurations:
804+
805+
1. **Eager imports (baseline)**: Normal Python imports with no lazy machinery.
806+
Standard Python behavior.
807+
808+
2. **Lazy + filter forcing eager**: Filter function returns ``False`` for all
809+
imports, forcing eager execution, then all imports are reified at script
810+
end. Measures pure filter calling overhead since every import goes through
811+
the filter but executes eagerly.
812+
813+
3. **Lazy + filter allowing lazy + reification**: Filter function returns
814+
``True`` for all imports, allowing lazy execution. All imports are reified
815+
at script end. Measures filter overhead when imports are actually lazy.
816+
817+
4. **Lazy + no filter + reification**: No filter installed, imports are lazy
818+
and reified at script end. Baseline for lazy behavior without filter.
819+
820+
The benchmarks used `hyperfine <https://github.com/sharkdp/hyperfine>`_,
821+
testing 278 standard library modules. Each ran in a fresh Python process.
822+
All configurations force the import of exactly the same set of modules
823+
(all modules loaded by the eager baseline) to ensure a fair comparison.
824+
825+
The benchmark environment used CPU isolation with 32 logical CPUs (0-15 at
826+
3200 MHz, 16-31 at 2400 MHz), the performance scaling governor, Turbo Boost
827+
disabled, and full ASLR randomization. The overhead error bars are computed
828+
using standard error propagation for the formula ``(value - baseline) /
829+
baseline``, accounting for uncertainties in both the measured value and the
830+
baseline.
831+
832+
Startup time improvements
833+
~~~~~~~~~~~~~~~~~~~~~~~~~~
834+
835+
The primary performance benefit of lazy imports is reduced startup time by
836+
loading only the modules actually used at runtime, rather than optimistically
837+
loading entire dependency trees at startup.
838+
839+
Real-world deployments at scale have demonstrated that the benefits can be
840+
massive, though of course this depends on the specific codebase and usage
841+
patterns. Organizations with large, interconnected codebases have reported
842+
substantial reductions in server reload times, ML training initialization,
843+
command-line tool startup, and Jupyter notebook loading. Memory usage
844+
improvements have also been observed as unused modules remain unloaded.
845+
846+
For detailed case studies and performance data from production deployments,
847+
see:
848+
849+
- `Python Lazy Imports With Cinder
850+
<https://developers.facebook.com/blog/post/2022/06/15/python-lazy-imports-with-cinder/>`__
851+
(Meta Instagram Server)
852+
- `Lazy is the new fast: How Lazy Imports and Cinder accelerate machine
853+
learning at Meta
854+
<https://engineering.fb.com/2024/01/18/developer-tools/lazy-imports-cinder-machine-learning-meta/>`__
855+
(Meta ML Workloads)
856+
- `Inside HRT's Python Fork
857+
<https://www.hudsonrivertrading.com/hrtbeat/inside-hrts-python-fork/>`__
858+
(Hudson River Trading)
859+
860+
The benefits scale with codebase complexity: the larger and more
861+
interconnected the codebase, the more dramatic the improvements.
862+
737863
Typing and tools
738864
----------------
739865

0 commit comments

Comments
 (0)