-
Notifications
You must be signed in to change notification settings - Fork 424
terminators: wait specifically for WST to be ready #3679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terminators: wait specifically for WST to be ready #3679
Conversation
| } | ||
| } | ||
|
|
||
| func conditionIsTrue(conditions conditionsv1alpha1.Conditions, conditionType conditionsv1alpha1.ConditionType) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be replaced with conditionsv1alpha1.IsTrue(wt, conditionType)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah very nice! Changed it.
f9909b8 to
06ae166
Compare
|
@SimonTheLeg that test failure a flake you want to look at or wdyt? |
|
This is a different issue. The flake before was that the admin was not able to access them, presumably because the workspace was marked for deletion too early (e.g. role has not synched yet). Luckily this seems to not occur anymore with this change. This time it is the user-1 having this issue. Which begs the question if we have a flake in our permission reconciliation or applying of the clusterroles. I had a look at the kcp.log of this particular run, but everything looks exactly like it looks for a run that succeeds. It updates the global cache accordingly (and with no direct error) and then in the logs you only see requests failing afterwards For now I suggest we get this PR merged, because it solves a different possibility for a flake. And then I would say, I create a bug ticket for the other one. So far I have no clue what the issue could be :/ |
|
/retest |
|
the only thing that I can think of is that there is some sort of race-condition that if the LogicalCluster is marked for deletion if the cache is fast enough, the ClusterRole gets still applied and if not, we just stop doing it, leading to this issue 🤔 |
On-behalf-of: SAP <[email protected]> Signed-off-by: Simon Bein <[email protected]>
06ae166 to
dd0498a
Compare
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/approve |
|
LGTM label has been added. Git tree hash: 252ecb340f7421f2290bd57ccf11f11b6cd26713
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xrstf The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
With this change we now specifically wait for WorkspaceTypes to report themselves and their VW URLs as ready.
We will see if this already fixes the flakiness that we have in the test. If not we'll need to investigate how sometimes a VW cannot be watched within 30 seconds due to failing permissions (for example see https://public-prow.kcp.k8c.io/view/s3/prow-public-data/pr-logs/pull/kcp-dev_kcp/3412/pull-kcp-test-e2e-sharded/1981638200807919616)
Summary
What Type of PR Is This?
/kind bug
Related Issue(s)
Fixes #
Release Notes