Skip to content

kubectl delete gets stuck although resource is properly deleted. #13656

@0xavi0

Description

@0xavi0

Environmental Info:
K3s Version:
v1.34.2, v1.34.3, v1.35.0, v1.35.1

Node(s) CPU architecture, OS, and Version:

Cluster Configuration:

Provisioning cluster this way:
k3d cluster create upstream --agents 1 --wait --image "rancher/k3s:$K3SERVER_VERSION"

Describe the bug:

The problem we’re encountering is that our e2e tests fail because at certain points we run a kubectl delete on Fleet CRDs such as GitRepo or HelmOp, which have finalizers and, obviously, are not deleted immediately.

We use k3d with the --image flag to create the test cluster.

You can see the failure, for example, in this GitHub job:
https://github.com/rancher/fleet/actions/runs/21831206743/job/63109227320?pr=4602

In that job, kubectl delete is being called with -v=6.

I was able to reproduce this by running the attached script.
I can confirm that delete times out, but the GitRepo resource no longer exists in the cluster.

If, while delete is still running (after several seconds have already passed), you run a kubectl get on the same resource (in another terminal, of course), you can see that it no longer exists.

Naturally, our first thought was that perhaps a finalizer was still pending and that was why the resource was not being removed.
However, I can confirm that it does not exist even before delete reports the timeout.
This has been tested with the following versions:

  • v1.35.1-k3s1
  • v1.35.0-k3s3
  • v1.35.0-k3s1
  • v1.34.3-k3s1
  • v1.34.2-k3s1
    In all of them it fails at some point (it doesn’t always fail immediately; sometimes it takes longer, sometimes less).
    The same script has been running for more than 6 hours using version v1.34.1-k3s1 without failing even once.
    In fact, we detected the issue when upgrading from v1.34.1-k3s1 to v1.35.0-k3s1.

I've tested the following versions and could not recreate the issue:

  • v1.34.1-k3s1
  • v1.33.6-k3s1
  • v1.33.5-k3s1

The first version failing is v1.34.2-k3s1

Steps To Reproduce:

Run the script and reproduce the problem, simply extract the tarball and execute the included bash script from the same directory:

tar xzvf test-delete.tar.gz
cd test-delete
./test-delete.sh

The script creates and deletes a GitRepo in a loop until the timeout occurs.

test-delete.tar.gz

Expected behavior:

kubectl delete should not get stuck when deleting a resource (that is successfully deleted)

Actual behavior:

kubectl delete gets stuck when deleting a resource.

Additional context / logs:

I can also confirm the issue does not happen when using etcd, so it looks an issue related with kine.
I cannot recreate it when adding --k3s-arg "--cluster-init@server:0" to k3d

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    To Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions