-
Changes to a Kafka CRD (specifically the config: section) are currently not propagated to the underlying Kafka pods. Checking the Strimzi Operator logs shows many entries of: 2021-08-11 13:27:38 DEBUG AbstractOperator:390 - Reconciliation #41422(timer) Kafka(pipelines/kafka-cluster): Try to acquire lock lock::pipelines::Kafka::kafka-cluster
...
2021-08-11 13:27:48 DEBUG AbstractOperator:420 - Reconciliation #41422(timer) Kafka(pipelines/kafka-cluster): Failed to acquire lock lock::pipelines::Kafka::kafka-cluster within 10000ms. Suggesting that the lock couldn't be acquired. However, at the same time locks for operations on the KafkaConnect CRD seemed to work: 2021-08-11 13:27:38 DEBUG AbstractOperator:390 - Reconciliation #41423(timer) KafkaConnect(pipelines/kafka-connect-cluster-2): Try to acquire lock lock::pipelines::KafkaConnect::kafka-connect-cluster-2
2021-08-11 13:27:38 DEBUG AbstractOperator:393 - Reconciliation #41423(timer) KafkaConnect(pipelines/kafka-connect-cluster-2): Lock lock::pipelines::KafkaConnect::kafka-connect-cluster-2 acquired
...
2021-08-11 13:27:38 DEBUG AbstractOperator:410 - Reconciliation #41423(timer) KafkaConnect(pipelines/kafka-connect-cluster-2): Lock lock::pipelines::KafkaConnect::kafka-connect-cluster-2 released This persisted even after restarting the Strimzi Operator. Should I just try restarting a few times, or are there other approaches? Since #3844 suggested this might be an operator/cloud provider failure, I thought I'd start as a discussion. But I can create an issue if it's more appropriate. Using Strimzi 0.24.0, Kafka 2.8.0, Kubernetes v1.20.7 via Kubespray on vSphere VMs. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Every Strimzi custom resource you create is reconciled periodically (every 2 minutes by default) or when it is updated. For each resource, only one reconciliation can run at a time - if more of them would be running, they might fight with each other etc. So there is a lock which make sure only one of the is running. When a new reconciliation should be started, it tries to get the lock and if it is not available in just ends with this log message instead of waiting longer because it knows that in any case soon another reconciliation will try it again. The lock is per custom resource, so you can see a connector reconcile fine while Kafka reconciliation does not get the lock. So this message on its own doesn't mean much -> it just means that another reconciliation is running at this point. It might not mean anything bad - for example the operator is waiting for pod to roll or for something to be created (storage or load balancers take sometimes longer to create) ... these things can easily take more than 10 seconds. If it is happening too often, it might indicate some issues. The way to check is to look into the whole log ...
|
Beta Was this translation helpful? Give feedback.
Every Strimzi custom resource you create is reconciled periodically (every 2 minutes by default) or when it is updated. For each resource, only one reconciliation can run at a time - if more of them would be running, they might fight with each other etc. So there is a lock which make sure only one of the is running. When a new reconciliation should be started, it tries to get the lock and if it is not available in just ends with this log message instead of waiting longer because it knows that in any case soon another reconciliation will try it again. The lock is per custom resource, so you can see a connector reconcile fine while Kafka reconciliation does not get the lock. So this message…