feat: Add pipeline run parallelism config #12442

sduvvuri1603 · 2025-11-12T21:20:56Z

Summary

Follow-up to the cleanup of unused mutex_name/semaphore_key: adds pipeline_run_parallelism as the official workflow-level concurrency knob so pipeline versions can declare how many runs may execute concurrently.
The feature Pipeline_run_parallelism lets a pipeline version declare how many runs are allowed to execute at the same time, using Argo semaphores backed by a ConfigMap for workflow-level concurrency
Threads pipeline_run_parallelism through SDK → IR → compiler → backend; compiler validates (>0) without setting spec.parallelism, and backend CreateRun upserts the kfp-pipeline-config ConfigMap keyed by PipelineVersionId and wires Argo spec.synchronization.semaphores via configMapKeyRef.

Validation

SDK and backend goldens now include the updated sample, showing consistent IR and Argo outputs with the parallelism limit.
Built custom API server and driver images from this branch, loaded them into a kind cluster, ran the sample, and confirmed that the number of simultaneously running component pods never exceeded the configured limit.
Added the parallelism validation helper to the e2e suite (e2e_utils.go + invocation in pipeline_e2e_test.go), rebuilt the test cluster with the fresh backend images, exercised the focused pipeline_run_parallelism scenario, and then ran the end-to-end suite to confirm the new check passes with the concurrency cap enforced.

Note:
Hardened .github/actions/test-and-report/action.yml: the workflow now waits for the ml-pipeline deployment, port-forwards svc/ml-pipeline to localhost:8888, and curls /apis/v2beta1/healthz (TLS-aware). This eliminates the dial tcp [::1]:8888: connect: connection refused failures that intermittently broke the parallelism/recurring-run E2E suite.

Follow up to PR - remove unused semaphore_key and mutex_name fields

google-oss-prow · 2025-11-12T21:21:06Z

Hi @sduvvuri1603. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alyssacgoins · 2025-11-13T15:39:46Z

/retest

google-oss-prow · 2025-11-17T14:35:21Z

@sduvvuri1603: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hbelmiro · 2025-11-17T14:37:52Z

/ok-to-test

hbelmiro · 2025-11-17T14:37:57Z

/retest

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>

Re-add the argo workflow package alias so the new semaphore logic compiles; merge dropped the import which broke go build. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Regenerated the workflow compiler fixtures after the merge so they now include the fully qualified ml-pipeline address and updated dependency ordering. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Remove the branch-specific skip so workflows build and load the envoy image just like master. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Resolve the latest pipeline version when a recurring run is created with only a pipeline ID, and copy that version’s ID/name into the job so the API surfaces a concrete pipelineVersionId. Update the unit test to verify the stored job now references the resolved version. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Remove temporary CI debug paths for missing images, TLS readiness logging, forced MinIO usage, and TLS overlay ordering so behavior matches upstream. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Lower sleep_seconds in the max_active_runs test pipeline to avoid repeated timeouts while still exercising parallelism. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Update the compiled workflow golden to match the reduced sleep_seconds default in the max_active_runs pipeline. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Shorten max-active-runs sleep defaults and regenerate compiler goldens to keep workflow expectations aligned with the updated pipeline spec. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

sduvvuri1603 · 2026-02-04T14:26:34Z

/retest

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

…run-parallelism Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

sduvvuri1603 · 2026-02-09T06:46:34Z

/retest

Signed-off-by: Sruthi Duvvuri <sduvvuri@redhat.com>

sduvvuri1603 · 2026-02-10T20:29:26Z

/retest

google-oss-prow bot added the do-not-merge/work-in-progress label Nov 12, 2025

google-oss-prow bot requested review from HumairAK, droctothorpe, mprahl and zazulam November 12, 2025 21:21

google-oss-prow bot added needs-ok-to-test size/L labels Nov 12, 2025

sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch 2 times, most recently from 99f2fc8 to d34a1b2 Compare November 12, 2025 21:22

sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch 7 times, most recently from 82756e1 to 60a35d8 Compare November 14, 2025 21:27

google-oss-prow bot added ok-to-test and removed needs-ok-to-test labels Nov 17, 2025

sduvvuri1603 marked this pull request as ready for review November 17, 2025 17:06

google-oss-prow bot removed the do-not-merge/work-in-progress label Nov 17, 2025

google-oss-prow bot requested review from DharmitD, alyssacgoins and gmfrasca November 17, 2025 17:06

sduvvuri1603 marked this pull request as draft November 17, 2025 17:06

google-oss-prow bot added the do-not-merge/work-in-progress label Nov 17, 2025

sduvvuri1603 and others added 12 commits February 4, 2026 07:30

test: expect latest version id in recurring run

9b8dd58

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>

ci: allow skipping metadata-envoy for branch

35bf496

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>

fix: restore workflow semaphore import after merge

5da85a8

Re-add the argo workflow package alias so the new semaphore logic compiles; merge dropped the import which broke go build. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

fix: drop unused apiv2beta1 import

8a94d3a

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

fix: align job deletion with upstream

efc153c

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

chore: keep DeleteJob parameter lint-compliant

8f00c64

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

ci: always deploy metadata envoy for api tests

f269985

Remove the branch-specific skip so workflows build and load the envoy image just like master. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

ci: reduce max-active-runs pipeline sleep

15ce786

Lower sleep_seconds in the max_active_runs test pipeline to avoid repeated timeouts while still exercising parallelism. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

test: refresh max-active-runs golden

ebaa256

Update the compiled workflow golden to match the reduced sleep_seconds default in the max_active_runs pipeline. Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch from 66b1ef5 to ebaa256 Compare February 4, 2026 12:31

sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch 4 times, most recently from 5a3efbf to f434f6c Compare February 5, 2026 18:32

sduvvuri1603 and others added 2 commits February 5, 2026 13:35

Fix recurring run metric definition

ee9ed84

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Merge remote-tracking branch 'upstream/master' into feature/pipeline-…

f253b4e

…run-parallelism Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch from f434f6c to f253b4e Compare February 5, 2026 18:36

sduvvuri1603 added 2 commits February 8, 2026 23:40

Merge branch 'master' into feature/pipeline-run-parallelism

745d7ad

Merge branch 'master' into feature/pipeline-run-parallelism

516a196

sduvvuri1603 added 3 commits February 10, 2026 09:54

Merge branch 'master' into feature/pipeline-run-parallelism

9cab33f

Signed-off-by: Sruthi Duvvuri <sduvvuri@redhat.com>

Merge branch 'master' into feature/pipeline-run-parallelism

6604709

Merge branch 'master' into feature/pipeline-run-parallelism

a731855

Merge branch 'master' into feature/pipeline-run-parallelism

d3fdead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add pipeline run parallelism config #12442

feat: Add pipeline run parallelism config #12442

sduvvuri1603 commented Nov 12, 2025 •

edited

Loading

Uh oh!

google-oss-prow bot commented Nov 12, 2025

Uh oh!

alyssacgoins commented Nov 13, 2025

Uh oh!

google-oss-prow bot commented Nov 17, 2025

Uh oh!

hbelmiro commented Nov 17, 2025

Uh oh!

hbelmiro commented Nov 17, 2025

Uh oh!

sduvvuri1603 commented Feb 4, 2026

Uh oh!

sduvvuri1603 commented Feb 9, 2026

Uh oh!

sduvvuri1603 commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

feat: Add pipeline run parallelism config #12442

Are you sure you want to change the base?

feat: Add pipeline run parallelism config #12442

Conversation

sduvvuri1603 commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

google-oss-prow bot commented Nov 12, 2025

Uh oh!

alyssacgoins commented Nov 13, 2025

Uh oh!

google-oss-prow bot commented Nov 17, 2025

Uh oh!

hbelmiro commented Nov 17, 2025

Uh oh!

hbelmiro commented Nov 17, 2025

Uh oh!

sduvvuri1603 commented Feb 4, 2026

Uh oh!

sduvvuri1603 commented Feb 9, 2026

Uh oh!

sduvvuri1603 commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

sduvvuri1603 commented Nov 12, 2025 •

edited

Loading