Conversation
There was a problem hiding this comment.
Pull request overview
Improves e2e test resilience by waiting for Deployments to be ready (not just present) and by skipping MUO tests when MUO isn’t enabled/managed, addressing flakes caused by timeout/race conditions around reconciliation readiness.
Changes:
- Add
Eventually-based helpers to wait for MUO/Gatekeeper Deployments to become ready. - Update MUO restore test to skip when MUO isn’t enabled/managed and to wait for readiness after reconciliation.
- Update Guardrails Gatekeeper restore tests to wait for readiness after reconciliation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/e2e/operator.go
Outdated
| waitForMUODeploymentReady := func(ctx context.Context) { | ||
| Eventually(func(g Gomega, ctx context.Context) { | ||
| d, err := clients.Kubernetes.AppsV1().Deployments(managedUpgradeOperatorNamespace).Get(ctx, managedUpgradeOperatorDeployment, metav1.GetOptions{}) | ||
| g.Expect(err).NotTo(HaveOccurred()) | ||
| g.Expect(ready.DeploymentIsReady(d)).To(BeTrue(), "expected MUO deployment to be ready") | ||
| }).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed()) | ||
| } |
There was a problem hiding this comment.
This helper function’s logic is duplicated later for Gatekeeper (same pattern: Get + DeploymentIsReady inside Eventually). Consider extracting a single reusable helper (e.g., waitForDeploymentReady(ctx, namespace, name, timeout)) to reduce duplication and keep readiness semantics consistent across tests.
Co-authored-by: hlipsig <8000786+hlipsig@users.noreply.github.com>
Revert flakey test changes and increase MUO SpecTimeout to 3 minutes
| By("waiting for the MUO deployment to be reconciled") | ||
| GetK8sObjectWithRetry(ctx, getFunc, managedUpgradeOperatorDeployment, metav1.GetOptions{}) | ||
| }, SpecTimeout(2*time.Minute)) | ||
| }, SpecTimeout(3*time.Minute)) |
There was a problem hiding this comment.
Rather than specify a timeout for this specific test, can we leverage the Eventually() pattern like other tests, using the DefaultEventuallyTimeout at test/e2e/helpers.go, which is 5 minutes?
There was a problem hiding this comment.
If the .bingo change is required, can we leave it to a new PR since this one is meant to fix the e2e flake?
Which issue this PR addresses:
ARO-25407 Flakes for timeouts in MUO test.
What this PR does / why we need it:
Improves tests resilience by changing the timeout condition to 3 minutes.
Test plan for issue:
Local run looked good. Green e2e signal is what's needed.
Is there any documentation that needs to be updated for this PR?
How do you know this will function as expected in production?
Only fixes tests.