You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After the cluster has been created succesfully, I start deploying multiple applications (with Helm and/or kustomize) in quick succesion. Here's the list of following applications: longhorn, percona mysql, percona postgres, redis (with haproxy), rabbitmq, velero, cert-manager, error-pages, traefik (patched), whoami, uptime-kuma, meilisearch, crowdsec, redis-insight, argocd, it-tools, stirling-pdf. Between 15-20 applications are deployed within <5 minutes.
In 2.17.x I had no issues deploying all these applications, but in 2.18.x I started seeing ImagePullBackOff warnings for quite a lot of pods, sometimes on just 1 or 2 of the 4 agent nodes.
Failed to pull image "percona/percona-xtradb-cluster-operator:1.17.0": failed to pull and unpack image "docker.io/percona/percona-xtradb-cluster-operator:1.17.0": failed to resolve reference "docker.io/percona/percona-xtradb-cluster-operator:1.17.0": failed to do request: Head "https://registry-1.docker.io/v2/percona/percona-xtradb-cluster-operator/manifests/1.17.0": dial tcp: lookup registry-1.docker.io: Try again
While I queried Google/AI for this and it concluded this must be DNS networking issue (either on the pod or node), I'm confused on why/how this happened. My initial thought was that maybe because outgoing traffic is going through the NAT router, and it might get ratelimited by the docker registry, but the error does not seem to be that way. I recreated the pods and rebooted the specific nodes that had these issues, but it did not have any effect. At that point I destroyed the cluster (created with 2.18.1), recreated it with 2.17.4, and had no image pull issues when deploying all the above-mentioned applications.
Has anyone experienced a similiar issues as well since upgrading to 2.18.x ?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Since
2.18.x
I've started using thenat_router
options in combination withuse_control_plane_lb
.After the cluster has been created succesfully, I start deploying multiple applications (with Helm and/or kustomize) in quick succesion. Here's the list of following applications:
longhorn
,percona mysql
,percona postgres
,redis (with haproxy)
,rabbitmq
,velero
,cert-manager
,error-pages
,traefik (patched)
,whoami
,uptime-kuma
,meilisearch
,crowdsec
,redis-insight
,argocd
,it-tools
,stirling-pdf
. Between 15-20 applications are deployed within <5 minutes.In
2.17.x
I had no issues deploying all these applications, but in2.18.x
I started seeingImagePullBackOff
warnings for quite a lot of pods, sometimes on just 1 or 2 of the 4 agent nodes.While I queried Google/AI for this and it concluded this must be DNS networking issue (either on the pod or node), I'm confused on why/how this happened. My initial thought was that maybe because outgoing traffic is going through the NAT router, and it might get ratelimited by the docker registry, but the error does not seem to be that way. I recreated the pods and rebooted the specific nodes that had these issues, but it did not have any effect. At that point I destroyed the cluster (created with
2.18.1
), recreated it with2.17.4
, and had no image pull issues when deploying all the above-mentioned applications.Has anyone experienced a similiar issues as well since upgrading to
2.18.x
?Beta Was this translation helpful? Give feedback.
All reactions