Replies: 3 comments 3 replies
-
|
First of all, Strimzi 0.49.0 does support Kafka 4.1.0. So you do not need to use 4.1.1 instead of it. You can choose on oyur own which version to use. Second, For the third issue of it rolling all pods - that is something we can look into. Please provide full logs from the operator as well as full custom resources and steps to reproduce it. |
Beta Was this translation helpful? Give feedback.
-
|
sorry - just updated the text. please read again ;) |
Beta Was this translation helpful? Give feedback.
-
|
Ok sorry my first thought was wrong and the logs brought light into the dark. The operator did correct and only one pod from the 3x node cluster was re-scheduled with the new image tag. However through some "external" behaviour an additional pod from the cluster was killed / terminated. After the pod terminated the operator re-created it (which is correct) with the new pod template. Because of the new image tag which was not yet available in the registry the cluster went offline ... I'm not sure if this is intended that the operator creates the pod with the new template of if he should wait for the first pod to be healthy agin (so create the pod with the old template). The critical logs are here: 1763963753260 2025-11-24T05:55:53.260Z "message" : "[AdminClient clientId=adminclient-154] Connection to node 2 (cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2.cluster-fra1-dev1-kafka-kafka-brokers.strimzi-kafka.svc/10.244.0.248:9091) could not be established. Node may not be available.",
1763963753172 2025-11-24T05:55:53.172Z "message" : "[AdminClient clientId=adminclient-154] Connection to node 2 (cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2.cluster-fra1-dev1-kafka-kafka-brokers.strimzi-kafka.svc/10.244.0.248:9091) could not be established. Node may not be available.",
1763963753110 2025-11-24T05:55:53.110Z "message" : "[AdminClient clientId=adminclient-154] Connection to node 2 (cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2.cluster-fra1-dev1-kafka-kafka-brokers.strimzi-kafka.svc/10.244.0.248:9091) could not be established. Node may not be available.",
1763963753058 2025-11-24T05:55:53.058Z "message" : "[AdminClient clientId=adminclient-153] Connection to node -3 (cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2.cluster-fra1-dev1-kafka-kafka-brokers.strimzi-kafka.svc.cluster.local/10.244.0.248:9090) could not be established. Node may not be available.",
1763963752792 2025-11-24T05:55:52.792Z "message" : "Reconciliation #4160(timer) Kafka(strimzi-kafka/cluster-fra1-dev1-kafka): Pod cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2 is not ready. We will check if KafkaRoller can do anything about it.",
1763963752784 2025-11-24T05:55:52.784Z "message" : "Reconciliation #4160(timer) Kafka(strimzi-kafka/cluster-fra1-dev1-kafka): Error waiting for pod strimzi-kafka/cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2 to become ready: io.strimzi.operator.common.TimeoutException: Exceeded timeout of 300000ms while waiting for Pods resource cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2 in namespace strimzi-kafka to be ready",
1763963752784 2025-11-24T05:55:52.784Z "message" : "Reconciliation #4160(timer) Kafka(strimzi-kafka/cluster-fra1-dev1-kafka): Exceeded timeout of 300000ms while waiting for Pods resource cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2 in namespace strimzi-kafka to be ready",
1763963302983 2025-11-24T05:48:22.983Z "message" : "Reconciliation #4143(timer) Kafka(strimzi-kafka/cluster-fra1-dev1-kafka): Will temporarily skip verifying pod cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2/2 is up-to-date due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2 cannot be updated right now., retrying after at least 250ms",
1763960543014 2025-11-24T05:02:23.014Z "message" : "Reconciliation #4084(timer) Kafka(strimzi-kafka/cluster-fra1-dev1-kafka): Will temporarily skip verifying pod cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2/2 is up-to-date due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2 cannot be updated right now., retrying after at least 250ms",
1763959943003 2025-11-24T04:52:23.003Z "message" : "Reconciliation #4072(timer) Kafka(strimzi-kafka/cluster-fra1-dev1-kafka): Will temporarily skip verifying pod cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2/2 is up-to-date due to io.strimzi.operator.cluster.operator.resource.KafkaRoller$UnforceableProblem: Pod cluster-fra1-dev1-kafka-cluster-fra1-dev1-kafka-nodepool-2 cannot be updated right now., retrying after at least 250ms",You can see that the operator behaves correctly and says that the Since i'm not sure if this is expected behaviour please just give me a short feedback and then i will close the discussiion. And sorry for the confusion first ... was on the wrong way :/ |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Bug Description
Hi,
we upgraded the strimzi kafka operator from v0.48 to v0.49. The operator started to update the kafka nodepool containers but since the new docker image tag was not available the image / pod was in imagePullbackOff. As far as this this is expected behaviour. But now the strange thing starts: the operator stopped not with the pod rollout after the first one stuck in imagePullBackOff but the tried to roll out the new version to a second node. Since we run a 3x node cluster the cluster went offline. This should never happen in my opinion. The operator should just stop and wait till the first pod is available / healthy again.
We use a private image registry and the new image tag was not synced yet.
config:
spec: kafka: version: 4.1.0 metadataVersion: 4.1-IV1 config: offsets.topic.replication.factor: 3 transaction.state.log.replication.factor: 3 transaction.state.log.min.isr: 2 default.replication.factor: 3 min.insync.replicas: 2Steps to reproduce
Expected behavior
Strimzi version
0.49
Kubernetes version
1.33.1
Installation method
YAML Files
Infrastructure
DigitalOcean
Configuration files and logs
No response
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions