Skip to content

Attempt at fixing flakey tests#4700

Open
hlipsig wants to merge 5 commits intomasterfrom
hlipsig/ARO-25407/fix-muo-gatekeeper-e2e-flake
Open

Attempt at fixing flakey tests#4700
hlipsig wants to merge 5 commits intomasterfrom
hlipsig/ARO-25407/fix-muo-gatekeeper-e2e-flake

Conversation

@hlipsig
Copy link
Copy Markdown
Collaborator

@hlipsig hlipsig commented Mar 19, 2026

Which issue this PR addresses:

ARO-25407 Flakes for timeouts in MUO test.

What this PR does / why we need it:

Improves tests resilience by changing the timeout condition to 3 minutes.

Test plan for issue:

Local run looked good. Green e2e signal is what's needed.

Is there any documentation that needs to be updated for this PR?

How do you know this will function as expected in production?

Only fixes tests.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves e2e test resilience by waiting for Deployments to be ready (not just present) and by skipping MUO tests when MUO isn’t enabled/managed, addressing flakes caused by timeout/race conditions around reconciliation readiness.

Changes:

  • Add Eventually-based helpers to wait for MUO/Gatekeeper Deployments to become ready.
  • Update MUO restore test to skip when MUO isn’t enabled/managed and to wait for readiness after reconciliation.
  • Update Guardrails Gatekeeper restore tests to wait for readiness after reconciliation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +438 to +444
waitForMUODeploymentReady := func(ctx context.Context) {
Eventually(func(g Gomega, ctx context.Context) {
d, err := clients.Kubernetes.AppsV1().Deployments(managedUpgradeOperatorNamespace).Get(ctx, managedUpgradeOperatorDeployment, metav1.GetOptions{})
g.Expect(err).NotTo(HaveOccurred())
g.Expect(ready.DeploymentIsReady(d)).To(BeTrue(), "expected MUO deployment to be ready")
}).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed())
}
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper function’s logic is duplicated later for Gatekeeper (same pattern: Get + DeploymentIsReady inside Eventually). Consider extracting a single reusable helper (e.g., waitForDeploymentReady(ctx, namespace, name, timeout)) to reduce duplication and keep readiness semantics consistent across tests.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI commented Mar 20, 2026

@hlipsig I've opened a new pull request, #4701, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 3 commits March 20, 2026 00:25
Co-authored-by: hlipsig <8000786+hlipsig@users.noreply.github.com>
Revert flakey test changes and increase MUO SpecTimeout to 3 minutes
By("waiting for the MUO deployment to be reconciled")
GetK8sObjectWithRetry(ctx, getFunc, managedUpgradeOperatorDeployment, metav1.GetOptions{})
}, SpecTimeout(2*time.Minute))
}, SpecTimeout(3*time.Minute))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than specify a timeout for this specific test, can we leverage the Eventually() pattern like other tests, using the DefaultEventuallyTimeout at test/e2e/helpers.go, which is 5 minutes?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the .bingo change is required, can we leave it to a new PR since this one is meant to fix the e2e flake?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants