terminators: wait specifically for WST to be ready #3679

SimonTheLeg · 2025-10-24T10:11:31Z

With this change we now specifically wait for WorkspaceTypes to report themselves and their VW URLs as ready.

We will see if this already fixes the flakiness that we have in the test. If not we'll need to investigate how sometimes a VW cannot be watched within 30 seconds due to failing permissions (for example see https://public-prow.kcp.k8c.io/view/s3/prow-public-data/pr-logs/pull/kcp-dev_kcp/3412/pull-kcp-test-e2e-sharded/1981638200807919616)

Summary

What Type of PR Is This?

/kind bug

Related Issue(s)

Fixes #

Release Notes

NONE

xrstf · 2025-10-27T15:35:56Z

test/e2e/virtual/terminatingworkspaces/virtualworkspace_test.go

 	}
 }

+func conditionIsTrue(conditions conditionsv1alpha1.Conditions, conditionType conditionsv1alpha1.ConditionType) bool {


This could be replaced with conditionsv1alpha1.IsTrue(wt, conditionType)

ah very nice! Changed it.

xrstf · 2025-10-28T13:59:10Z

@SimonTheLeg that test failure a flake you want to look at or wdyt?

SimonTheLeg · 2025-10-29T10:47:53Z

This is a different issue. The flake before was that the admin was not able to access them, presumably because the workspace was marked for deletion too early (e.g. role has not synched yet). Luckily this seems to not occur anymore with this change.

This time it is the user-1 having this issue. Which begs the question if we have a flake in our permission reconciliation or applying of the clusterroles. I had a look at the kcp.log of this particular run, but everything looks exactly like it looks for a run that succeeds. It updates the global cache accordingly (and with no direct error) and then in the logs you only see requests failing afterwards

I1023 13:50:20.817761   63219 labelclusterrolebinding_controller.go:168] "queueing ClusterRoleBinding" reconciler="kcp-tenancy-replicate-clusterrolebinding" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.817833   63219 replication_reconcile.go:187] "Creating object in global cache" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-replication-controller" key="v1.clusterrolebindings.rbac.authorization.k8s.io::1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" reconcilerKey="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.818767   63219 replication_reconcile.go:209] "Updating object in global cache" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-replication-controller" key="v1.clusterroles.rbac.authorization.k8s.io::1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" reconcilerKey="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" kind="ClusterRole" namespace="" name="1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.817863   63219 labelclusterrolebinding_controller.go:168] "queueing ClusterRoleBinding" reconciler="kcp-apis-replicate-clusterrolebinding" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.818896   63219 labelclusterrolebinding_controller.go:228] "processing key" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-apis-replicate-clusterrolebinding" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.818044   63219 httplog.go:134] "HTTP" verb="GET" URI="/clusters/2pgvufntpx5q7kx4/apis/core.kcp.io/v1alpha1/logicalclusters/cluster" latency="1.531337ms" userAgent="kcp/v1.33.3+kcp (linux/arm64) kubernetes/5da7ab5/kcp-workspace+root" audit-ID="e067a559-5107-441f-9d2c-4f7ee0e46f19" srcIP="127.0.0.1:35110" apf_pl="exempt" apf_fs="exempt" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_execution_time="1.305494ms" resp=200
I1023 13:50:20.817736   63219 labelclusterrolebinding_controller.go:168] "queueing ClusterRoleBinding" reconciler="kcp-core-replicate-clusterrolebinding" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.818993   63219 labelclusterrolebinding_controller.go:228] "processing key" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-tenancy-replicate-clusterrolebinding" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.817749   63219 labelclusterrole_controller.go:166] "queueing ClusterRole" reconciler="kcp-tenancy-replicate-clusterrole" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" reason="ClusterRoleBinding" ClusterRoleBinding.name="1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.819017   63219 labelclusterrole_controller.go:228] "processing key" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-tenancy-replicate-clusterrole" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.819002   63219 labelclusterrolebinding_controller.go:228] "processing key" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-core-replicate-clusterrolebinding" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.817852   63219 labelclusterrole_controller.go:166] "queueing ClusterRole" reconciler="kcp-apis-replicate-clusterrole" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" reason="ClusterRoleBinding" ClusterRoleBinding.name="1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.819686   63219 labelclusterrole_controller.go:228] "processing key" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-apis-replicate-clusterrole" key="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.822125   63219 httplog.go:134] "HTTP" verb="GET" URI="/clusters/2pgvufntpx5q7kx4/apis/core.kcp.io/v1alpha1/logicalclusters/cluster" latency="2.577828ms" userAgent="kcp/v1.33.3+kcp (linux/arm64) kubernetes/5da7ab5/kcp-workspace+root" audit-ID="f778af17-9499-4a3b-a265-23858b662a74" srcIP="127.0.0.1:35110" apf_pl="exempt" apf_fs="exempt" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_execution_time="2.143503ms" resp=200
I1023 13:50:20.823111   63219 workspace_reconcile_phase.go:114] "LogicalCluster is still deleting, requeuing" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-workspace" key="1hjsxwowl68q6u29|e2e-workspace-45gs4" workspace.workspace="1hjsxwowl68q6u29" workspace.namespace="" workspace.name="e2e-workspace-45gs4" workspace.apiVersion="" reconciler="phase" cluster="2pgvufntpx5q7kx4" after="964.62013ms"
I1023 13:50:20.823409   63219 httplog.go:134] "HTTP" verb="POST" URI="/services/cache/shards/root/clusters/1hjsxwowl68q6u29/apis/rbac.authorization.k8s.io/v1/clusterrolebindings" latency="2.975592ms" userAgent="kcp/v1.33.3+kcp (linux/arm64) kubernetes/5da7ab5" audit-ID="aa9723aa-074c-46a3-97ac-fbbc1f9ab2b3" srcIP="127.0.0.1:35110" resp=201
I1023 13:50:20.824408   63219 replication_reconcile.go:209] "Updating object in global cache" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-replication-controller" key="v1.clusterrolebindings.rbac.authorization.k8s.io::1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" reconcilerKey="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" kind="ClusterRoleBinding" namespace="" name="1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.827415   63219 httplog.go:134] "HTTP" verb="PUT" URI="/services/cache/shards/root/clusters/1hjsxwowl68q6u29/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" latency="2.680789ms" userAgent="kcp/v1.33.3+kcp (linux/arm64) kubernetes/5da7ab5" audit-ID="c6784973-143b-44c9-9978-97feac7b6d99" srcIP="127.0.0.1:35110" resp=200
I1023 13:50:20.828806   63219 replication_reconcile.go:209] "Updating object in global cache" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-replication-controller" key="v1.clusterrolebindings.rbac.authorization.k8s.io::1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" reconcilerKey="1hjsxwowl68q6u29|1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" kind="ClusterRoleBinding" namespace="" name="1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator"
I1023 13:50:20.831612   63219 httplog.go:134] "HTTP" verb="PUT" URI="/services/cache/shards/root/clusters/1hjsxwowl68q6u29/apis/rbac.authorization.k8s.io/v1/clusterroles/1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" latency="12.525095ms" userAgent="kcp/v1.33.3+kcp (linux/arm64) kubernetes/5da7ab5" audit-ID="b16d5d38-8e4b-4a02-917b-c08fbfaf9d01" srcIP="127.0.0.1:35110" resp=200
I1023 13:50:20.832979   63219 httplog.go:134] "HTTP" verb="PUT" URI="/services/cache/shards/root/clusters/1hjsxwowl68q6u29/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/1hjsxwowl68q6u29:gamma-ofbewlbtbr-terminator" latency="3.657279ms" userAgent="kcp/v1.33.3+kcp (linux/arm64) kubernetes/5da7ab5" audit-ID="c5e48176-6eee-4d9c-9bef-f1c000e727fb" srcIP="127.0.0.1:35110" resp=200
I1023 13:50:20.909394   63219 authorization.go:93] "Forbidden" URI="/services/terminatingworkspaces/1hjsxwowl68q6u29:alpha-krsavygsbu/clusters/%2A/apis/core.kcp.io/v1alpha1/logicalclusters" reason="access denied"
I1023 13:50:20.909557   63219 httplog.go:134] "HTTP" verb="LIST" URI="/services/terminatingworkspaces/1hjsxwowl68q6u29:alpha-krsavygsbu/clusters/%2A/apis/core.kcp.io/v1alpha1/logicalclusters" latency="341.444µs" userAgent="terminatingworkspaces.test/v0.0.0 (linux/arm64) kubernetes/$Format/TestTerminatingWorkspacesVirtualWorkspaceAccess-virtual" audit-ID="9a199a70-67cb-4291-b5b3-dc6766bb83a1" srcIP="127.0.0.1:35086" resp=403
I1023 13:50:21.006112   63219 workspace_controller.go:299] "processing key" component="kcp" postStartHook="kcp-start-controllers" reconciler="kcp-workspace" key="1hjsxwowl68q6u29|e2e-workspace-pvk7s"
I1023 13:50:21.008852   63219 httplog.go:134] "HTTP" verb="GET" URI="/clusters/h25xyujjeq713wzp/apis/core.kcp.io/v1alpha1/logicalclusters/cluster" latency="2.273344ms" userAgent="kcp/v1.33.3+kcp (linux/arm64) kubernetes/5da7ab5/kcp-workspace+root" audit-ID="57bfd789-43b4-4bd0-8792-8504533dd203" srcIP="127.0.0.1:35110" apf_pl="exempt" apf_fs="exempt" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_execution_time="1.204933ms" resp=200
I1023 13:50:21.009558   63219 authorization.go:93] "Forbidden" URI="/services/terminatingworkspaces/1hjsxwowl68q6u29:alpha-krsavygsbu/clusters/%2A/apis/core.kcp.io/v1alpha1/logicalclusters" reason="access denied"

For now I suggest we get this PR merged, because it solves a different possibility for a flake. And then I would say, I create a bug ticket for the other one. So far I have no clue what the issue could be :/

SimonTheLeg · 2025-10-29T10:47:58Z

/retest

SimonTheLeg · 2025-10-29T10:49:18Z

the only thing that I can think of is that there is some sort of race-condition that if the LogicalCluster is marked for deletion if the cache is fast enough, the ClusterRole gets still applied and if not, we just stop doing it, leading to this issue 🤔

On-behalf-of: SAP <[email protected]> Signed-off-by: Simon Bein <[email protected]>

SimonTheLeg · 2025-11-04T10:00:39Z

/retest

SimonTheLeg · 2025-11-04T15:49:20Z

/retest

xrstf · 2025-11-06T09:00:31Z

/retest

xrstf · 2025-11-06T10:23:46Z

/approve
/lgtm

kcp-ci-bot · 2025-11-06T10:23:53Z

LGTM label has been added.

Git tree hash: 252ecb340f7421f2290bd57ccf11f11b6cd26713

kcp-ci-bot · 2025-11-06T10:23:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xrstf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [xrstf]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kcp-ci-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 24, 2025

SimonTheLeg requested review from embik and mjudeikis October 24, 2025 10:11

xrstf reviewed Oct 27, 2025

View reviewed changes

SimonTheLeg force-pushed the termiantors-wait-for-wst-ready branch from f9909b8 to 06ae166 Compare October 28, 2025 09:32

kcp-ci-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 28, 2025

kcp-ci-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2025

terminators: wait specifically for WST to be ready

dd0498a

On-behalf-of: SAP <[email protected]> Signed-off-by: Simon Bein <[email protected]>

SimonTheLeg force-pushed the termiantors-wait-for-wst-ready branch from 06ae166 to dd0498a Compare November 4, 2025 09:27

kcp-ci-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2025

kcp-ci-bot assigned xrstf Nov 6, 2025

kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2025

kcp-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2025

kcp-ci-bot merged commit 5609047 into kcp-dev:main Nov 6, 2025
14 checks passed

terminators: wait specifically for WST to be ready #3679

terminators: wait specifically for WST to be ready #3679

Conversation

SimonTheLeg commented Oct 24, 2025

Summary

What Type of PR Is This?

Related Issue(s)

Release Notes

Uh oh!

xrstf Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

SimonTheLeg Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

xrstf commented Oct 28, 2025

Uh oh!

SimonTheLeg commented Oct 29, 2025

Uh oh!

SimonTheLeg commented Oct 29, 2025

Uh oh!

SimonTheLeg commented Oct 29, 2025

Uh oh!

SimonTheLeg commented Nov 4, 2025

Uh oh!

SimonTheLeg commented Nov 4, 2025

Uh oh!

xrstf commented Nov 6, 2025

Uh oh!

xrstf commented Nov 6, 2025

Uh oh!

kcp-ci-bot commented Nov 6, 2025

Uh oh!

kcp-ci-bot commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants