Skip to content

Cached image deleted even though one pod was still using the image #540

@sivarama-p-raju

Description

@sivarama-p-raju

Hi Team,

I recently encountered a situation when an on-prem production kubeadm cluster was upgraded on a weekend and when all was done, a pod was in an ImagePullBackOff state.

That was when I was involved and I looked at the cachedimages and found that particular image was in an error state as it was tying to pull it from the remote registry and could not find it there, as it did not exist in the remote registry.

However, we know for a fact that there was 1 pod using that image:tag. Below are some timelines, for your quick reference:

  1. As per Kubernetes events, the cachedImage was deleted on 30th December. Below are the relevant events:
  {"component":"cachedimage-controller","count":"1","createdAt":"2025-12-30 07:54:17 +0000 UTC","eventType":"kubernetes-event","host":"","kind":"CachedImage","lastSeenAt":"2025-12-30 07:54:17 +0000 UTC","message":"Image <repository>/ord-handling:v1.7.18 has expired, deleting it","name":"<repository>-ord-handling-v1.7.18","namespace":"default","reason":"Expiring","type":"Normal"}
  {"component":"cachedimage-controller","count":"1","createdAt":"2025-12-30 07:54:17 +0000 UTC","eventType":"kubernetes-event","host":"","kind":"CachedImage","lastSeenAt":"2025-12-30 07:54:17 +0000 UTC","message":"Image <repository>/ord-handling:v1.7.18 successfully expired","name":"<repository>-ord-handling-v1.7.18","namespace":"default","reason":"Expired","type":"Normal"}
  {"component":"cachedimage-controller","count":"1","createdAt":"2025-12-30 07:54:17 +0000 UTC","eventType":"kubernetes-event","host":"","kind":"CachedImage","lastSeenAt":"2025-12-30 07:54:17 +0000 UTC","message":"Removing image <repository>/ord-handling:v1.7.18 from cache","name":"<repository>-ord-handling-v1.7.18","namespace":"default","reason":"CleaningUp","type":"Normal"}
  {"component":"cachedimage-controller","count":"1","createdAt":"2025-12-30 07:54:17 +0000 UTC","eventType":"kubernetes-event","host":"","kind":"CachedImage","lastSeenAt":"2025-12-30 07:54:17 +0000 UTC","message":"Image <repository>/ord-handling:v1.7.18 successfully removed from cache","name":"<repository>-ord-handling-v1.7.18","namespace":"default","reason":"CleanedUp","type":"Normal"}
  1. The Kubernetes upgrade was performed on the 17th January. The nodes were drained and upgraded during this time and when everything was done, the pod could not start as the image was not present in kuik's local registry and was not there on the remote registry as well. Below are the events relating to this:
  {"component":"kubelet","count":"1","createdAt":"2026-01-17 08:01:07 +0000 UTC","eventType":"kubernetes-event","host":"<NODE>","kind":"Pod","lastSeenAt":"2026-01-17 08:01:07 +0000 UTC","message":"Pulling image \"localhost:7439/<repository>/ord-handling:v1.7.18\"","name":"ord-handling-6fbf599777-24cdd","namespace":"<NS>","reason":"Pulling","type":"Normal"}
  {"component":"kubelet","count":"1","createdAt":"2026-01-17 08:01:30 +0000 UTC","eventType":"kubernetes-event","host":"<NODE>","kind":"Pod","lastSeenAt":"2026-01-17 08:01:30 +0000 UTC","message":"Failed to pull image \"localhost:7439/<repository>/ord-handling:v1.7.18\": rpc error: code = NotFound desc = failed to pull and unpack image \"localhost:7439/<repository>/ord-handling:v1.7.18\": failed to resolve reference \"localhost:7439/<repository>/ord-handling:v1.7.18\": localhost:7439/<repository>/ord-handling:v1.7.18: not found","name":"ord-handling-6fbf599777-24cdd","namespace":"vip-app","reason":"Failed","type":"Warning"}

Is there a possibility that an image:tag used by a pod is deleted ? Any thoughts on how I can prevent this from happening again ?

Please note that we are currently running the helm chart with version 1.13.1.

Please let me know in case I can provide any additional information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions