Using `nat_router` since `2.18.x` indirectly leads to `ImagePullBackOff` when deploying multiple applications subsquently #1868

ToshY · 2025-08-10T21:42:31Z

ToshY
Aug 10, 2025

Problem

Since 2.18.x I've started using the nat_router options in combination with use_control_plane_lb.

  nat_router = {
    server_type = "cax21"
    location    = "nbg1"
    enable_sudo = false
    labels      = {}
  }

  # Control plane load balancer
  use_control_plane_lb = true
  control_plane_lb_type = "lb11"
  control_plane_lb_enable_public_interface = true

After the cluster has been created succesfully, I start deploying multiple applications (with Helm and/or kustomize) in quick succesion. Here's the list of following applications: longhorn, percona mysql, percona postgres, redis (with haproxy), rabbitmq, velero, cert-manager, error-pages, traefik (patched), whoami, uptime-kuma, meilisearch, crowdsec, redis-insight, argocd, it-tools, stirling-pdf. Between 15-20 applications are deployed within <5 minutes.

In 2.17.x I had no issues deploying all these applications, but in 2.18.x I started seeing ImagePullBackOff warnings for quite a lot of pods, sometimes on just 1 or 2 of the 4 agent nodes.

Failed to pull image "percona/percona-xtradb-cluster-operator:1.17.0": failed to pull and unpack image "docker.io/percona/percona-xtradb-cluster-operator:1.17.0": failed to resolve reference "docker.io/percona/percona-xtradb-cluster-operator:1.17.0": failed to do request: Head "https://registry-1.docker.io/v2/percona/percona-xtradb-cluster-operator/manifests/1.17.0": dial tcp: lookup registry-1.docker.io: Try again

While I queried Google/AI for this and it concluded this must be DNS networking issue (either on the pod or node), I'm confused on why/how this happened. My initial thought was that maybe because outgoing traffic is going through the NAT router, and it might get ratelimited by the docker registry, but the error does not seem to be that way. I recreated the pods and rebooted the specific nodes that had these issues, but it did not have any effect. At that point I destroyed the cluster (created with 2.18.1), recreated it with 2.17.4, and had no image pull issues when deploying all the above-mentioned applications.

Has anyone experienced a similiar issues as well since upgrading to 2.18.x ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using `nat_router` since `2.18.x` indirectly leads to `ImagePullBackOff` when deploying multiple applications subsquently #1868

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Using nat_router since 2.18.x indirectly leads to ImagePullBackOff when deploying multiple applications subsquently #1868

Uh oh!

ToshY Aug 10, 2025

Replies: 0 comments

Using `nat_router` since `2.18.x` indirectly leads to `ImagePullBackOff` when deploying multiple applications subsquently #1868

ToshY
Aug 10, 2025