-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Introduction
We had a discussion about dynamic pipelines from #1993, also partly related to #1963, this issue is to summarise the discussion and lay out the work that we need to do.
Related Issues:
A high-level, short overview of the problem(s) you are designing a solution for.
Background
Dynamic Pipeline has been one of the most asked questions, there are various solutions but often they are case-by-case. As a result the solutions comes with all fashion and it has been asked whether Kedro can provide a feature for that.
What is "Dynamic Pipeline"
When people are referring "Dynamic Pipeline", often they are talking about the same thing. We need to make a clear distinction between them before we start to build a solution for it.
We can roughly categorise them into 2 buckets
- Dynamic construction of Pipeline
- Dynamic behavior at runtime
Dynamic construction of Pipeline (easier)
Examples of these are:
- Time series forecasting - Pipeline make prediction for Day 1, next pipeline requires Day 1 prediction as input.
- Hyperparameters tuning
- Combined variable length of features - feature engineering combine N features into 1 DataFrame
- A list of countries - each need to be saved as a catalog entry, the data are then combined in a pipeline for further processing
Dynamic behavior at runtime (harder)
Examples of these are:
- 2nd order pipelines - pipelines generated from some node's output
-
I have a scenario that I would like to run a model training and model evaluation based on labels on dataset. Each Label
would trigger an indiviual pipeline. -
A pipeline that make prediction on 1 user, Fetch a list of N users, then run pipeline on each of them.
-
- Running node conditionally - Run A if B does not exist, otherwise run C
Why is it challenging for Kedro Users?
It needs experience with Kedro, often you need to combine advance features, i.e. TemplatedConfig + Jinja in Catalog + doing some for loop in your pipeline.py.

In addition, each of the use cases need different solutions. As part of the Kedro's value proposition is the standardisation. There are no well-known pattern for these solution, they are hard to reason and debug with Jinja
What's not in scope
- Non-DAG pipeline - i.e. Github Action, CircleCI type of pipeline.
- Skipping nodes - i.e. if A exist, don't run B and C (a workaround with hooks is possible)
- Dynamic node generation during a run
These two types of pipelines are fundamentally different from the "data-centric" approach of Kedro