Replies: 2 comments 1 reply
-
Can you please share the full YAMLs (both for the Kafka custom resources as well as for the Config Map with metrics configuration)? I tried to reproduce it few times, but it always seems to work as expected for me without any issues. The log from your AKS environment, the log suggests that the reconciliation of the Kafka clusters got stuck for some reason:
That will be some kind of bug. But it starts before the log starts. So it is hard to say what the cause is. So in this case, the operator will simply not see the change because of being stuck. So it basically ignored it. And the restart helped it recover. This is essentially a bug, but hard to keep track of without detailed logs :-/. But it would essentially ignore any changes you would make - not just the metrics. The GKE log with the infinite rolling restart is more interesting. If you can reproduce it, can you capture the Pod YAMLs or the StrimziPodSet resources between the different restarts (either |
Beta Was this translation helpful? Give feedback.
-
Hi @scholzj I've attached the yaml, but it's pretty orthodox one.
Are you suggesting changing loglevel or operator?
I thought so, and will try get those info in next release, but can't guarantee unfortunately. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
I've added the following changes to existing clusters' Kafka CRD manifests and applied to several of our clusters including AKS, GKE, and Openshift.
Then, I observed two kinds of different failure behaviors as well as successfully applied cases.
In both failure cases, Kafka CRD change was applied when observe Kafka CRD.
In one failure case, Zookeeper pods restart was triggered, but the env
ZOOKEEPER_METRICS_ENABLED
was not set true, and thus metrics was not enabled.Zookeeper pods seemed to infinitely rolling-restarted, but
ZOOKEEPER_METRICS_ENABLED
was not set true anyways.In another failure case, even Zookeeper pods restart was not triggered.
I observed the log
Kafka kafka-cluster in namespace kafka was MODIFIED
in operator log, but nothing happened.In both failure cases, I killed the operator pod and let it restart, then Zookeeper pods were triggered to get restarted with
ZOOKEEPER_METRICS_ENABLED
set true.I've asked for an advice on this issue in strimzi slack channel, then was advised to report it here.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Zookeeper is restarted with env var
ZOOKEEPER_METRICS_ENABLED
set true, so that we can see the metricsEnvironment (please complete the following information):
YAML files and logs
logs.zip
I've added strimzi operator logs files from two different clusters.
The first one strimzi-aks.log is from AKS cluster, where no zookeeper restart happened.
The second one strimzi-gke.log is from GKE cluster, where Zookeer restart happened, but env var
ZOOKEEPER_METRICS_ENABLED
was not set true anyways.For the strimzi-aks.log, I applied the change at around 2022-12-15 02:58:20 and there is no logs regarding zookeeper restarts.
For the strimzi-gke.log, I applied the change at around 2022-12-13 00:57:09 and there is logs regarding zookeeper restarts.
It's not likely the difference of AKS and GKE caused different behaviors because I saw the failure with zookeeper restarts in other AKS cluster.
In addition, I saw some successful cases in GKE and Openshift clusters.
Beta Was this translation helpful? Give feedback.
All reactions