-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Problem
E2E tests intermittently fail with the error:
failed to put file: Signed request requires setting up SeaweedFS S3 authentication
This happens because there's a race condition between the SeaweedFS S3 authentication setup and the test execution.
Root Cause Analysis
-
PR fix(CI): ensure SeaweedFS S3 auth is set up before tests #12322 (
7e64da9a9) added await_for_seaweedfs_init()function that waits for aninit-seaweedfsJob to complete before tests start. -
PR Add pod postStart lifecycle for SeaweedFS and remove Job initializer #12387 (
caf854eed) removed theinit-seaweedfsJob and switched to apostStartlifecycle hook for SeaweedFS configuration. However, thewait_for_seaweedfs_init()function was not updated to reflect this change. -
The current
wait_for_seaweedfs_init()function is essentially a no-op:
wait_for_seaweedfs_init () {
local namespace="$1"
local timeout="$2"
# This condition is NEVER true because the Job was removed in PR #12387
if kubectl -n "$namespace" get job init-seaweedfs > /dev/null 2>&1; then
if ! kubectl -n "$namespace" wait --for=condition=complete --timeout="$timeout" job/init-seaweedfs; then
return 1
fi
fi
# Falls through immediately - no waiting happens!
}- The race condition occurs because:
- SeaweedFS pod starts
- Readiness probe passes (checks
/statusendpoint) wait_for_pods()sees all pods are Ready and returnswait_for_seaweedfs_init()returns immediately (Job doesn't exist)- Tests start running
- Meanwhile, the
postStartlifecycle hook is still configuring S3 auth vias3.configurecommand - Tests fail because S3 auth isn't configured yet
Affected Files
.github/resources/scripts/helper-functions.sh- Contains the brokenwait_for_seaweedfs_init()function.github/resources/scripts/deploy-kfp.sh- Callswait_for_seaweedfs_init()for SeaweedFS storage backend
Proposed Solution
Update the wait_for_seaweedfs_init() function to verify that SeaweedFS S3 authentication is actually configured by testing S3 connectivity with credentials.
wait_for_seaweedfs_init () {
local namespace="$1"
local timeout="$2"
local start_time=$(date +%s)
local timeout_seconds=${timeout%s} # Remove 's' suffix
echo "Waiting for SeaweedFS S3 authentication to be configured..."
# Get credentials from the secret
local access_key=$(kubectl -n "$namespace" get secret mlpipeline-minio-artifact -o jsonpath='{.data.accesskey}' | base64 -d)
local secret_key=$(kubectl -n "$namespace" get secret mlpipeline-minio-artifact -o jsonpath='{.data.secretkey}' | base64 -d)
# Test S3 authentication by listing the bucket
while true; do
local current_time=$(date +%s)
local elapsed=$((current_time - start_time))
if [ "$elapsed" -ge "$timeout_seconds" ]; then
echo "ERROR: Timeout waiting for SeaweedFS S3 authentication"
return 1
fi
# Port-forward to SeaweedFS and test S3 auth
# Using kubectl run with aws-cli or curl to test signed S3 requests
if kubectl -n "$namespace" exec deploy/seaweedfs -- \
/bin/sh -c "echo 's3.bucket.list' | /usr/bin/weed shell 2>/dev/null | grep -q mlpipeline"; then
echo "SeaweedFS S3 authentication is configured"
return 0
fi
echo "Waiting for SeaweedFS S3 auth... (${elapsed}s elapsed)"
sleep 5
done
}Alternative Solutions
-
Create a marker ConfigMap: Have the
postStarthook create a ConfigMap after S3 configuration completes, then wait for that ConfigMap. -
Add startup probe: Configure a startup probe that only passes after S3 auth is configured (would require modifying the SeaweedFS deployment).
-
Synchronous S3 configuration: Move S3 configuration from
postStartto an init container that blocks until complete.
Impact
- Affects all CI jobs using SeaweedFS storage backend
- Causes intermittent/flaky test failures
- Can block PRs from merging due to false-positive test failures
References
- PR fix(CI): ensure SeaweedFS S3 auth is set up before tests #12322: Added
wait_for_seaweedfs_init()function - PR Add pod postStart lifecycle for SeaweedFS and remove Job initializer #12387: Removed init Job, switched to postStart lifecycle hook
- PR chore: update SeaweedFS to 4.00 and make it more robust #12406: Updated SeaweedFS to 4.00
- PR Fix: Seaweedfs admin credentials not loaded after restart #12460: Fixed SeaweedFS admin credentials not loaded after restart
Reproduction Steps
- Run E2E tests with
storage=seaweedfsconfiguration - The test may fail intermittently with S3 authentication errors
- The failure is more likely when:
- SeaweedFS pod takes longer to configure S3 auth
- Tests start immediately after pods become Ready
Labels
/area testing
Impacted by this bug? Give it a 👍.