-
Notifications
You must be signed in to change notification settings - Fork 993
Description
Follow up to #5247
During the spike, we profiled a performance bottleneck in KedroSession, described in detail here: #5247 (comment)
KedroContext and the load_context() method we also investigated during this spike and also load all of the context into memory before the session run starts. But it's less likely that it'd cause significant slowness on a typical use case. We decided to focus on the KedroSession initially.
KedroSession retrieves pipelines through the global pipelines object from kedro.framework.project, which is an instance of _ProjectPipelines. This object is designed as a lazily loaded, dict-like interface that defers pipeline loading until first access. However, the current lazy-loading implementation eagerly loads all pipelines on first access to any pipeline key.
The current behavior is the following:
_ProjectPipelinesstarts empty and uninitialized.bootstrap_project()→configure_project()sets_pipelines_modulebut does not load data.- When
KedroSession.run()accessespipelines[name], the__getitem__call is wrapped by_load_data_wrapper. - _load_data() imports the pipelines registry module and calls register_pipelines().
register_pipelines(), by default, callsfind_pipelines(), which constructs all pipelines and returns a full dictionary.- The entire pipelines dictionary is loaded into memory before the requested pipeline is returned.
This means that even when a single pipeline is requested, all pipelines are instantiated first.
The objective of this issue is to go deeper into the investigation into how pipelines are loaded during KedroSession initialization and execution, identify where unnecessary overhead is introduced (particularly when only a single pipeline is requested), and evaluate whether the current lazy-loading design achieves its intended performance benefits.
The expected outcome is a concrete, well-scoped proposal for improving performance, potentially by enabling more granular or truly lazy pipeline loading, along with an assessment of trade-offs.
Related tickets - #2879
Metadata
Metadata
Labels
Type
Projects
Status