[bug] SeaweedFS S3 Authentication Race Condition in CI

### Problem

E2E tests intermittently fail with the error:

```text
failed to put file: Signed request requires setting up SeaweedFS S3 authentication
```

This happens because there's a race condition between the SeaweedFS S3 authentication setup and the test execution.

### Root Cause Analysis

1. **PR #12322** (`7e64da9a9`) added a `wait_for_seaweedfs_init()` function that waits for an `init-seaweedfs` Job to complete before tests start.

1. **PR #12387** (`caf854eed`) removed the `init-seaweedfs` Job and switched to a `postStart` lifecycle hook for SeaweedFS configuration. However, **the `wait_for_seaweedfs_init()` function was not updated** to reflect this change.

1. The current `wait_for_seaweedfs_init()` function is essentially a no-op:

```bash
wait_for_seaweedfs_init () {
    local namespace="$1"
    local timeout="$2"
    # This condition is NEVER true because the Job was removed in PR #12387
    if kubectl -n "$namespace" get job init-seaweedfs > /dev/null 2>&1; then
        if ! kubectl -n "$namespace" wait --for=condition=complete --timeout="$timeout" job/init-seaweedfs; then
            return 1
        fi
    fi
    # Falls through immediately - no waiting happens!
}
```

1. The race condition occurs because:
   - SeaweedFS pod starts
   - Readiness probe passes (checks `/status` endpoint)
   - `wait_for_pods()` sees all pods are Ready and returns
   - `wait_for_seaweedfs_init()` returns immediately (Job doesn't exist)
   - Tests start running
   - Meanwhile, the `postStart` lifecycle hook is still configuring S3 auth via `s3.configure` command
   - Tests fail because S3 auth isn't configured yet

### Affected Files

- `.github/resources/scripts/helper-functions.sh` - Contains the broken `wait_for_seaweedfs_init()` function
- `.github/resources/scripts/deploy-kfp.sh` - Calls `wait_for_seaweedfs_init()` for SeaweedFS storage backend

### Proposed Solution

Update the `wait_for_seaweedfs_init()` function to verify that SeaweedFS S3 authentication is actually configured by testing S3 connectivity with credentials.

```bash
wait_for_seaweedfs_init () {
    local namespace="$1"
    local timeout="$2"
    local start_time=$(date +%s)
    local timeout_seconds=${timeout%s}  # Remove 's' suffix
    
    echo "Waiting for SeaweedFS S3 authentication to be configured..."
    
    # Get credentials from the secret
    local access_key=$(kubectl -n "$namespace" get secret mlpipeline-minio-artifact -o jsonpath='{.data.accesskey}' | base64 -d)
    local secret_key=$(kubectl -n "$namespace" get secret mlpipeline-minio-artifact -o jsonpath='{.data.secretkey}' | base64 -d)
    
    # Test S3 authentication by listing the bucket
    while true; do
        local current_time=$(date +%s)
        local elapsed=$((current_time - start_time))
        
        if [ "$elapsed" -ge "$timeout_seconds" ]; then
            echo "ERROR: Timeout waiting for SeaweedFS S3 authentication"
            return 1
        fi
        
        # Port-forward to SeaweedFS and test S3 auth
        # Using kubectl run with aws-cli or curl to test signed S3 requests
        if kubectl -n "$namespace" exec deploy/seaweedfs -- \
            /bin/sh -c "echo 's3.bucket.list' | /usr/bin/weed shell 2>/dev/null | grep -q mlpipeline"; then
            echo "SeaweedFS S3 authentication is configured"
            return 0
        fi
        
        echo "Waiting for SeaweedFS S3 auth... (${elapsed}s elapsed)"
        sleep 5
    done
}
```

### Alternative Solutions

1. **Create a marker ConfigMap**: Have the `postStart` hook create a ConfigMap after S3 configuration completes, then wait for that ConfigMap.

2. **Add startup probe**: Configure a startup probe that only passes after S3 auth is configured (would require modifying the SeaweedFS deployment).

3. **Synchronous S3 configuration**: Move S3 configuration from `postStart` to an init container that blocks until complete.

### Impact

- Affects all CI jobs using SeaweedFS storage backend
- Causes intermittent/flaky test failures
- Can block PRs from merging due to false-positive test failures

### References

- PR #12322: Added `wait_for_seaweedfs_init()` function
- PR #12387: Removed init Job, switched to postStart lifecycle hook
- PR #12406: Updated SeaweedFS to 4.00
- PR #12460: Fixed SeaweedFS admin credentials not loaded after restart

---

## Reproduction Steps

1. Run E2E tests with `storage=seaweedfs` configuration
2. The test may fail intermittently with S3 authentication errors
3. The failure is more likely when:
   - SeaweedFS pod takes longer to configure S3 auth
   - Tests start immediately after pods become Ready

### Labels





/area testing



---


Impacted by this bug? Give it a 👍.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] SeaweedFS S3 Authentication Race Condition in CI #12771

Problem

Root Cause Analysis

Affected Files

Proposed Solution

Alternative Solutions

Impact

References

Reproduction Steps

Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] SeaweedFS S3 Authentication Race Condition in CI #12771

Description

Problem

Root Cause Analysis

Affected Files

Proposed Solution

Alternative Solutions

Impact

References

Reproduction Steps

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions