Skip to content

Commit 18deec1

Browse files
author
arpechenin
committed
Add a proposal for the standalone driver implementation based on Agro Workflow backend
Signed-off-by: arpechenin <[email protected]>
1 parent 52c848b commit 18deec1

File tree

3 files changed

+13
-5
lines changed

3 files changed

+13
-5
lines changed

proposals/separate-standalone-driver/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ It creates a pod that launches the driver container using the kfp-driver image.
7272
7373
## Alternative
7474
75-
Instead of launching a new driver using a container template, configure the system to send requests to an already running server.
75+
Instead of launching a new driver's pod using a container template, configure the system to send requests to an already running server.
7676
Something like this:
7777
```yaml
7878
templates:
@@ -112,13 +112,15 @@ The HTTP template [is not able](https://github.com/argoproj/argo-workflows/issue
112112
There’s a trade-off between running a standalone driver service pod globally or single per workflow. This is a balance between better performance and avoiding a single point of failure.
113113
Currently, Argo [supports](https://github.com/argoproj/argo-workflows/issues/7891) only one driver pod per workflow option. Both options are based on the Agent pod, which is currently started per workflow — this is a limitation of the current implementation.
114114
115-
### Implementation
115+
### Implementation Based on the Executor Plugin
116116
117117
Instead of creating a driver pod for each task, we can reuse a single agent pod via a plugin template:
118118
[Agent pod](https://github.com/argoproj/argo-workflows/issues/5544) is a unit designed for extension.
119119
It can be extended by any server that implements the protocol.
120120
This server(plugin in Executor plugin terminology) runs as a sidecar alongside the agent pod.
121-
![img.png](kfp-plugin-flow.png)
121+
122+
Below is a scheme where, instead of creating a pod for the driver's task, we reuse the Argo Workflow Agent via a plugin
123+
![img.png](executor-plugin-flow.png)
122124
123125
124126
To move from the container template to the Executor Plugin template:
@@ -183,6 +185,12 @@ plugin:
183185
This proposal introduces an optimization for Kubeflow Pipelines (KFP) that replaces per-task driver pods with a lightweight standalone service based on Argo Workflows’ Executor Plugin mechanism. It significantly reduces pipeline task startup time by eliminating the overhead of scheduling a separate driver pod for each task — particularly beneficial for large pipelines with multiple steps and caching enabled.
184186
Instead of launching a new driver pod per task, the driver logic is offloaded to a shared sidecar container (agent pod) within the workflow. This reduces latency in cache lookups and metadata initialization.
185187
However, this approach does not fully eliminate pod scheduling issues: the standalone driver is not a global service, but is instantiated per workflow. Thus, a pod still needs to be scheduled for each workflow run.
186-
A key limitation of this implementation is that it currently supports only the Argo Workflows backend.
187188

188-
Follow-up: file a task to support a global agent pod shared across workflows to fully remove driver pod scheduling overhead. The community is [open](https://github.com/argoproj/argo-workflows/issues/7891) to it.
189+
## Disadvantages:
190+
A key limitation of this implementation is that it currently supports only the Argo Workflows backend. The Executor plugin also adds some extra complexity to maintenance and deployment.
191+
192+
## Open Questions:
193+
- Do we need a fallback mechanism to the per-task driver pods in case the Executor Plugin is not available in some installations? Should KFP continue supporting both execution flows (plugin-based and pod-based drivers) for compatibility?
194+
195+
## Follow-ups
196+
- Implement a global agent pod. The community is [open](https://github.com/argoproj/argo-workflows/issues/7891) to it.
148 KB
Loading
-51.7 KB
Binary file not shown.

0 commit comments

Comments
 (0)