Skip to content

chore(ci): Refine SeaweedFS S3 auth setup and enhance wait logic in deployment scripts#12772

Open
hbelmiro wants to merge 3 commits intokubeflow:masterfrom
hbelmiro:issue-12771
Open

chore(ci): Refine SeaweedFS S3 auth setup and enhance wait logic in deployment scripts#12772
hbelmiro wants to merge 3 commits intokubeflow:masterfrom
hbelmiro:issue-12771

Conversation

@hbelmiro
Copy link
Contributor

@hbelmiro hbelmiro commented Feb 5, 2026

This PR resolves a flakiness in the CI.

Resolves: #12771

Description of your changes:

E2E tests intermittently fail with Signed request requires setting up SeaweedFS S3 authentication due to a race condition.

The wait_for_seaweedfs_init() function checks for an init-seaweedfs Job that was removed in PR #12387. The function became a no-op, allowing tests to start before the SeaweedFS postStart lifecycle hook completed S3 authentication setup.

Solution

Updated wait_for_seaweedfs_init() to poll SeaweedFS and verify that the kubeflow-admin identity is configured by checking s3.configure output via weed shell.

Checklist:

…eployment scripts

Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>
@google-oss-prow
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@hbelmiro hbelmiro marked this pull request as ready for review February 5, 2026 17:16
Copilot AI review requested due to automatic review settings February 5, 2026 17:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition in CI tests where SeaweedFS S3 authentication was not fully configured before tests started, causing intermittent failures with "Signed request requires setting up SeaweedFS S3 authentication" errors.

The root cause was that PR #12387 removed an init Job and moved S3 configuration to a postStart lifecycle hook, but the wait_for_seaweedfs_init() function was not updated and became a no-op. This PR fixes the wait function to properly poll for S3 authentication completion.

Changes:

  • Replaced no-op wait logic with active polling that verifies SeaweedFS S3 authentication is configured
  • Updated error messages to reflect the new postStart-based initialization approach

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
.github/resources/scripts/helper-functions.sh Rewrote wait_for_seaweedfs_init() to poll SeaweedFS and verify kubeflow-admin identity exists via s3.configure command, with timeout handling and error logging
.github/resources/scripts/deploy-kfp.sh Updated error messages from "init job" terminology to "S3 authentication setup" to reflect the postStart lifecycle hook approach

… for consistency

Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>
@alyssacgoins
Copy link
Contributor

/lgtm


wait_for_seaweedfs_init () {
# Wait for SeaweedFS init job to complete to ensure S3 auth is configured
# Wait for SeaweedFS S3 authentication to be configured via postStart lifecycle hook.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a better way to handle this, because this keeps on coming up now and then, even after this fix, I believe we can still see failures because there is no retry logic within the seaweed poststart hook to retry bucket/user creation. Lets take this offline.

…es, and streamlined scripts

Signed-off-by: Helber Belmiro <helber.belmiro@gmail.com>
@google-oss-prow google-oss-prow bot removed the lgtm label Feb 6, 2026
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hbelmiro and additionally assign juliusvonkohout for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hbelmiro hbelmiro marked this pull request as draft February 6, 2026 19:40
@hbelmiro hbelmiro marked this pull request as ready for review February 6, 2026 19:44
@nsingla
Copy link
Contributor

nsingla commented Feb 6, 2026

/lgtm
/approved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] SeaweedFS S3 Authentication Race Condition in CI

3 participants