Replies: 1 comment 4 replies
-
You should really share the full logs and not just the part you think matters. But it looks like an DNS issue - ZooKeeper is very special in the way it does the hostnae verification. You can also disable it if you want -> you can look through the older discussions on how to do that if you want. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
We run kafka strimzi cluster in our dev and prod clusters on prem VMWARE infrastructure. Please see below the different compnents versions
Kubernetes Version: 1.21.4
OS: RHEL 7.9
Strimzi Kafka: 0.30
Persistent Storage: Yes
We run multiple kafka clusters in Dev/Test. We rebooted all the nodes one by one after patching (avoid drain/cordon to avoid the double outages). All the zookeepers and kakfa brokers restored successfully, without any issues.
We rebooted our prod VMs today after patching, both kakfa clusters couldn't recover and threw the below errors
2022-12-10 11:34:37,872 ERROR Failed to verify hostname: strimzi-prod-abc-zookeeper-2.strimzi-prod-abc-zookeeper-nodes.strimzi-prod-abc.svc.xyz.cre (org.apache.zookeeper.common.ZKTrustManager) [ListenerHandler-strimzi-prod-abc-zookeeper-0.strimzi-prod-abc-zookeeper-nodes.strimzi-prod-abc.svc/10.240.43.159:3888]
2022-12-10 11:34:37,872 ERROR Failed to verify hostname: strimzi-prod-abc-zookeeper-2.strimzi-prod-abc-zookeeper-nodes.strimzi-prod-abc.svc.xyz.cre (org.apache.zookeeper.common.ZKTrustManager) [ListenerHandler-strimzi-prod-abc-zookeeper-0.strimzi-prod-abc-zookeeper-nodes.strimzi-prod-abc.svc/10.240.43.159:3888] javax.net.ssl.SSLPeerUnverifiedException: Certificate for <strimzi-prod-abc-zookeeper-2.strimzi-prod-abc-zookeeper-nodes.strimzi-prod-abc.svc.xyz.cre> doesn't match any of the subject alternative names: [*.strimzi-prod-abc-zookeeper-client.strimzi-prod-abc.svc
I killed pods, tried many things but nothing worked. It looks DNS issue, I checked DNS lookup both forward and reverse all was working. All our apps working perfectly and didnt have any DNS issues whatsoever.
I deleted the cluster and tried again but it didnt help. Eventually, I reinstalled the cluster operator and then reinstalled the both kafka cluster's. I didn't delete the data in PVs/PVCs but somehow zookeeper kept old data but kafka PVs didn't have historical data so there was mismatch between zookeepers and kafka. I was expecting, historical data is there all kafka topics etc should have been there but it seems after reinstalling the cluster opertaor everything got reset. Finally, I deleted everything in PVs and then all came back fine.
I am not sure why didnt we have issue in Dev/test and why did we have an issues in prod. What is the best way to deal the situation if we have to reboot the nodes for patching ?
We also use blue/green strategy for our clusters, if we curover services to new cluster. What would be the best way to setup cluster and restore data on new cluster. Please advise.
Only difference is between Devtest and prod is external brokers, in dev/test we have externak brokers configured.
To Reproduce
1- Reboot nodes
Expected behavior
Cluster should autoheal itself after the nodes reboot.
Environment (please complete the following information):
Beta Was this translation helpful? Give feedback.
All reactions