-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Environmental Info:
K3s Version:
v1.34.2, v1.34.3, v1.35.0, v1.35.1
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
Provisioning cluster this way:
k3d cluster create upstream --agents 1 --wait --image "rancher/k3s:$K3SERVER_VERSION"
Describe the bug:
The problem we’re encountering is that our e2e tests fail because at certain points we run a kubectl delete on Fleet CRDs such as GitRepo or HelmOp, which have finalizers and, obviously, are not deleted immediately.
We use k3d with the --image flag to create the test cluster.
You can see the failure, for example, in this GitHub job:
https://github.com/rancher/fleet/actions/runs/21831206743/job/63109227320?pr=4602
In that job, kubectl delete is being called with -v=6.
I was able to reproduce this by running the attached script.
I can confirm that delete times out, but the GitRepo resource no longer exists in the cluster.
If, while delete is still running (after several seconds have already passed), you run a kubectl get on the same resource (in another terminal, of course), you can see that it no longer exists.
Naturally, our first thought was that perhaps a finalizer was still pending and that was why the resource was not being removed.
However, I can confirm that it does not exist even before delete reports the timeout.
This has been tested with the following versions:
v1.35.1-k3s1v1.35.0-k3s3v1.35.0-k3s1v1.34.3-k3s1v1.34.2-k3s1
In all of them it fails at some point (it doesn’t always fail immediately; sometimes it takes longer, sometimes less).
The same script has been running for more than 6 hours using versionv1.34.1-k3s1without failing even once.
In fact, we detected the issue when upgrading fromv1.34.1-k3s1tov1.35.0-k3s1.
I've tested the following versions and could not recreate the issue:
v1.34.1-k3s1v1.33.6-k3s1v1.33.5-k3s1
The first version failing is v1.34.2-k3s1
Steps To Reproduce:
Run the script and reproduce the problem, simply extract the tarball and execute the included bash script from the same directory:
tar xzvf test-delete.tar.gz
cd test-delete
./test-delete.shThe script creates and deletes a GitRepo in a loop until the timeout occurs.
Expected behavior:
kubectl delete should not get stuck when deleting a resource (that is successfully deleted)
Actual behavior:
kubectl delete gets stuck when deleting a resource.
Additional context / logs:
I can also confirm the issue does not happen when using etcd, so it looks an issue related with kine.
I cannot recreate it when adding --k3s-arg "--cluster-init@server:0" to k3d
Metadata
Metadata
Assignees
Labels
Type
Projects
Status