@@ -734,6 +734,132 @@ Subinterpreters are supported. Each subinterpreter maintains its own
734
734
``sys.lazy_modules `` and import state, so lazy imports in one subinterpreter
735
735
do not affect others.
736
736
737
+ Performance
738
+ -----------
739
+
740
+ Lazy imports have **no measurable performance overhead **. The implementation
741
+ is designed to be performance-neutral for both code that uses lazy imports and
742
+ code that doesn't.
743
+
744
+ Runtime performance
745
+ ~~~~~~~~~~~~~~~~~~~
746
+
747
+ After reification (first use), lazy imports have **zero overhead **. The
748
+ adaptive interpreter specializes the bytecode (typically after 2-3 accesses),
749
+ eliminating any checks. For example, ``LOAD_GLOBAL `` becomes
750
+ ``LOAD_GLOBAL_MODULE ``, which directly accesses the module identically to
751
+ normal imports.
752
+
753
+ The `pyperformance suite `_ confirms the implementation is performance-neutral.
754
+
755
+ .. _pyperformance suite : https://github.com/facebookexperimental/
756
+ free-threading-benchmarking/blob/main/results/bm-20250922-3.15.0a0-27836e5/
757
+ bm-20250922-vultr-x86_64-DinoV-lazy_imports-3.15.0a0-27836e5-vs-base.svg
758
+
759
+ Filter function performance
760
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
761
+
762
+ The filter function (set via ``sys.set_lazy_imports_filter() ``) is called for
763
+ every *potentially lazy * import to determine whether it should actually be
764
+ lazy. When no filter is set, this is simply a NULL check (testing whether a
765
+ filter function has been registered), which is a highly predictable branch that
766
+ adds essentially no overhead. When a filter is installed, it is called for each
767
+ potentially lazy import, but this still has **almost no measurable performance
768
+ cost **. To measure this, we benchmarked importing all 278 top-level importable
769
+ modules from the Python standard library (which transitively loads 392 total
770
+ modules including all submodules and dependencies), then forced reification of
771
+ every loaded module to ensure everything was fully materialized.
772
+
773
+ Note that these measurements establish the baseline overhead of the filter
774
+ mechanism itself. Of course, any user-defined filter function that performs
775
+ additional work beyond a trivial check will add overhead proportional to the
776
+ complexity of that work. However, we expect that in practice this overhead
777
+ will be dwarfed by the performance benefits gained from avoiding unnecessary
778
+ imports. The benchmarks below measure the minimal cost of the filter dispatch
779
+ mechanism when the filter function does essentially nothing.
780
+
781
+ We compared four different configurations:
782
+
783
+ .. list-table ::
784
+ :header-rows: 1
785
+ :widths: 50 25 25
786
+
787
+ * - Configuration
788
+ - Mean ± Std Dev (ms)
789
+ - Overhead vs Baseline
790
+ * - **Eager imports ** (baseline)
791
+ - 161.2 ± 4.3
792
+ - 0%
793
+ * - **Lazy + filter forcing eager **
794
+ - 161.7 ± 4.2
795
+ - +0.3% ± 3.7%
796
+ * - **Lazy + filter allowing lazy + reification **
797
+ - 162.0 ± 4.0
798
+ - +0.5% ± 3.7%
799
+ * - **Lazy + no filter + reification **
800
+ - 161.4 ± 4.3
801
+ - +0.1% ± 3.8%
802
+
803
+ The four configurations:
804
+
805
+ 1. **Eager imports (baseline) **: Normal Python imports with no lazy machinery.
806
+ Standard Python behavior.
807
+
808
+ 2. **Lazy + filter forcing eager **: Filter function returns ``False `` for all
809
+ imports, forcing eager execution, then all imports are reified at script
810
+ end. Measures pure filter calling overhead since every import goes through
811
+ the filter but executes eagerly.
812
+
813
+ 3. **Lazy + filter allowing lazy + reification **: Filter function returns
814
+ ``True `` for all imports, allowing lazy execution. All imports are reified
815
+ at script end. Measures filter overhead when imports are actually lazy.
816
+
817
+ 4. **Lazy + no filter + reification **: No filter installed, imports are lazy
818
+ and reified at script end. Baseline for lazy behavior without filter.
819
+
820
+ The benchmarks used `hyperfine <https://github.com/sharkdp/hyperfine >`_,
821
+ testing 278 standard library modules. Each ran in a fresh Python process.
822
+ All configurations force the import of exactly the same set of modules
823
+ (all modules loaded by the eager baseline) to ensure a fair comparison.
824
+
825
+ The benchmark environment used CPU isolation with 32 logical CPUs (0-15 at
826
+ 3200 MHz, 16-31 at 2400 MHz), the performance scaling governor, Turbo Boost
827
+ disabled, and full ASLR randomization. The overhead error bars are computed
828
+ using standard error propagation for the formula ``(value - baseline) /
829
+ baseline ``, accounting for uncertainties in both the measured value and the
830
+ baseline.
831
+
832
+ Startup time improvements
833
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
834
+
835
+ The primary performance benefit of lazy imports is reduced startup time by
836
+ loading only the modules actually used at runtime, rather than optimistically
837
+ loading entire dependency trees at startup.
838
+
839
+ Real-world deployments at scale have demonstrated that the benefits can be
840
+ massive, though of course this depends on the specific codebase and usage
841
+ patterns. Organizations with large, interconnected codebases have reported
842
+ substantial reductions in server reload times, ML training initialization,
843
+ command-line tool startup, and Jupyter notebook loading. Memory usage
844
+ improvements have also been observed as unused modules remain unloaded.
845
+
846
+ For detailed case studies and performance data from production deployments,
847
+ see:
848
+
849
+ - `Python Lazy Imports With Cinder
850
+ <https://developers.facebook.com/blog/post/2022/06/15/python-lazy-imports-with-cinder/> `__
851
+ (Meta Instagram Server)
852
+ - `Lazy is the new fast: How Lazy Imports and Cinder accelerate machine
853
+ learning at Meta
854
+ <https://engineering.fb.com/2024/01/18/developer-tools/lazy-imports-cinder-machine-learning-meta/> `__
855
+ (Meta ML Workloads)
856
+ - `Inside HRT's Python Fork
857
+ <https://www.hudsonrivertrading.com/hrtbeat/inside-hrts-python-fork/> `__
858
+ (Hudson River Trading)
859
+
860
+ The benefits scale with codebase complexity: the larger and more
861
+ interconnected the codebase, the more dramatic the improvements.
862
+
737
863
Typing and tools
738
864
----------------
739
865
0 commit comments