Zookeeper pod restarting once every time Strimzi cluster is deployed #8509
Replies: 1 comment · 6 replies
-
You will probably need to elaborate a bit more on it. When you deploy a new Kafka cluster, new ZooKeeper cluster is deployed as part of it. So what does it mean that the pods are restarted? They are created and started. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks @scholzj for the response. Below is the state of kafka cluster whenever I deploy it. Zookeeper pods always show Restarts count as 1, and after that everything works fine. -> I am using NFS, but I think the error in logs does not point to NFS. I have also attached all the logs. These errors just come once, when I am deploying a new cluster, and after zookeeper pod is restarted these errors cannot be seen in the logs anymore. 1 of the Zookeeper pod error: 2023-05-13 08:39:20,563 ERROR Exception while listening (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-my-cluster-zookeeper-1.my-cluster-zookeeper-nodes.kafka.svc:3888] |
Beta Was this translation helpful? Give feedback.
All reactions
-
Well, the error suggests some DNS error. But without context of full logs, it is hard to say what it means. But these restarts usually happen as the nodes are searching for each other. One thing that I found out helps against it is to properly configure the resources and make sure it has enough RAM and CPU. I normally try to use at least something like this for ZOoKeeper: resources:
requests:
memory: 1Gi
cpu: 300m
limits:
memory: 1Gi
cpu: 500m
No, this is not related. But it would cause other issues that is why I mentioned it. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Configuring resources helped, it is not getting restarted every time now, but still, I am facing this issue sometimes. Can this be similar to below issue, as DNS does not get resolved in time I think: Also pasting complete zookeeper logs for 1 of the pod:
Thanks |
Beta Was this translation helpful? Give feedback.
All reactions
-
You cannot really debug things from a single log file without the context of the rest of the cluster - you need to look at all the logs and compare them to what the other components are doing. Distributed systems like this are essentially eventually consistent. So things like this might happen while the nodes are looking for each other. Does the cluster work eventually? |
Beta Was this translation helpful? Give feedback.
All reactions
-
yes, eventually after a single restart everything works. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Every time I try to install Strimzi cluster, Zookeeper pods are restarted once, and then they work properly.
Below is the Kafka CO:
Also attached all the log files of Cluster Operator and Zookeeper.
cluster-operator.log
zook1.log
zook0.log
zook2.log
Beta Was this translation helpful? Give feedback.
All reactions