Skip to content

Conversation

@sduvvuri1603
Copy link
Contributor

@sduvvuri1603 sduvvuri1603 commented Nov 12, 2025

Summary

  • Follow-up to the cleanup of unused mutex_name/semaphore_key: adds pipeline_run_parallelism as the official workflow-level concurrency knob so pipeline versions can declare how many runs may execute concurrently.
  • The feature Pipeline_run_parallelism lets a pipeline version declare how many runs are allowed to execute at the same time, using Argo semaphores backed by a ConfigMap for workflow-level concurrency
  • Threads pipeline_run_parallelism through SDK → IR → compiler → backend; compiler validates (>0) without setting spec.parallelism, and backend CreateRun upserts the kfp-pipeline-config ConfigMap keyed by PipelineVersionId and wires Argo spec.synchronization.semaphores via configMapKeyRef.

Validation

  • SDK and backend goldens now include the updated sample, showing consistent IR and Argo outputs with the parallelism limit.
  • Built custom API server and driver images from this branch, loaded them into a kind cluster, ran the sample, and confirmed that the number of simultaneously running component pods never exceeded the configured limit.
  • Added the parallelism validation helper to the e2e suite (e2e_utils.go + invocation in pipeline_e2e_test.go), rebuilt the test cluster with the fresh backend images, exercised the focused pipeline_run_parallelism scenario, and then ran the end-to-end suite to confirm the new check passes with the concurrency cap enforced.

Note:
Hardened .github/actions/test-and-report/action.yml: the workflow now waits for the ml-pipeline deployment, port-forwards svc/ml-pipeline to localhost:8888, and curls /apis/v2beta1/healthz (TLS-aware). This eliminates the dial tcp [::1]:8888: connect: connection refused failures that intermittently broke the parallelism/recurring-run E2E suite.

Follow up to PR - remove unused semaphore_key and mutex_name fields

@google-oss-prow
Copy link

Hi @sduvvuri1603. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sduvvuri1603 sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch 2 times, most recently from 99f2fc8 to d34a1b2 Compare November 12, 2025 21:22
@alyssacgoins
Copy link
Contributor

/retest

@sduvvuri1603 sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch 7 times, most recently from 82756e1 to 60a35d8 Compare November 14, 2025 21:27
@google-oss-prow
Copy link

@sduvvuri1603: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hbelmiro
Copy link
Contributor

/ok-to-test

@hbelmiro
Copy link
Contributor

/retest

sduvvuri1603 and others added 12 commits February 4, 2026 07:30
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Re-add the argo workflow package alias so the new semaphore logic
compiles; merge dropped the import which broke go build.

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Regenerated the workflow compiler fixtures after the merge so they now
include the fully qualified ml-pipeline address and updated dependency
ordering.

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Remove the branch-specific skip so workflows build and load the envoy image just like master.

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve the latest pipeline version when a recurring run is created with only a pipeline ID, and copy that version’s ID/name into the job so the API surfaces a concrete pipelineVersionId. Update the unit test to verify the stored job now references the resolved version.

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Remove temporary CI debug paths for missing images, TLS readiness logging,
forced MinIO usage, and TLS overlay ordering so behavior matches upstream.

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Lower sleep_seconds in the max_active_runs test pipeline to
avoid repeated timeouts while still exercising parallelism.

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Update the compiled workflow golden to match the reduced
sleep_seconds default in the max_active_runs pipeline.

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@sduvvuri1603 sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch from 66b1ef5 to ebaa256 Compare February 4, 2026 12:31
Shorten max-active-runs sleep defaults and regenerate compiler goldens
to keep workflow expectations aligned with the updated pipeline spec.

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@sduvvuri1603
Copy link
Contributor Author

/retest

@sduvvuri1603 sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch 4 times, most recently from 5a3efbf to f434f6c Compare February 5, 2026 18:32
sduvvuri1603 and others added 2 commits February 5, 2026 13:35
Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…run-parallelism

Signed-off-by: sduvvuri1603 <sduvvuri@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@sduvvuri1603 sduvvuri1603 force-pushed the feature/pipeline-run-parallelism branch from f434f6c to f253b4e Compare February 5, 2026 18:36
@sduvvuri1603
Copy link
Contributor Author

/retest

@sduvvuri1603
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants