Replies: 3 comments · 12 replies
-
Is that a regular GKE? Or this GKE Autopilot which seems to do a lot of weird things? |
Beta Was this translation helpful? Give feedback.
All reactions
-
I did not used it my self. But there were reports from other users. If you search the discussions or issues for Autopilot you should be able to find it. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks. |
Beta Was this translation helpful? Give feedback.
All reactions
-
While testing GKE Autopilot in August 2022, Zookeeper and Kafka would be killed after 5 to 20 minutes with SIGKILL (despite having ample K8s reserved memory and tweaking JVM memory). Standard K8s clusters work, no problem. Would love to use Autopilot once this is resolved. |
Beta Was this translation helpful? Give feedback.
All reactions
-
So, does Autopilot tell you why does it sigkill it? Without understanding what or why is happening, it is impossible to resolve it or suggest some workaround. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I'll investigate the error cause when creating a new cluster in the coming weeks |
Beta Was this translation helpful? Give feedback.
All reactions
-
Regular GKE - v 1.21.11-gke.1100 |
Beta Was this translation helpful? Give feedback.
All reactions
-
Hmm, weird. |
Beta Was this translation helpful? Give feedback.
All reactions
-
2022-06-14 13:34:27 INFO ClusterOperator:128 - Triggering periodic reconciliation for namespace strimzi |
Beta Was this translation helpful? Give feedback.
All reactions
-
Sorry, but your log seems to be from a different operator, with a different error (and with a fairly old Strimzi version). So how is it the same problem? |
Beta Was this translation helpful? Give feedback.
All reactions
-
GCP has released a guide to running Strimzi on GKE Autopilot. Will try creating a new cluster to see if the problem is reproducible. |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
I've deployed the same Helm chart as I use on the standard GKE (non-autopilot) and for the last 12 hours the pods have been running stably. It probably had something to do with GCP's underlying Autopilot configuration. If problems turn up under load, I'll let you know, but otherwise this seems to be solved for GKE Autopilot. As reference, here is the guide: https://cloud.google.com/kubernetes-engine/docs/tutorials/apache-kafka-strimzi |
Beta Was this translation helpful? Give feedback.
All reactions
-
Upon further inspection, the entity operator is still running into a crash loop cycle (every ~5 min). Finding the root cause is not worth the trouble and am sticking with the standard K8s cluster. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I am facing same problem. This happened after one of EKS node went down and the strimzi-operator pod was spinned up on new node. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have 3 Kafka clusters deployed in GKE. For 2 of them, the entity operator pod keeps crashing. The error from the
user-operator
container isio.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.111.240.1/api/v1/namespaces/kafka/secrets/kafka-cluster-ca-cert. Message: secrets "kafka-cluster-ca-cert" is forbidden: User "system:anonymous" cannot get resource "secrets" in API group "" in the namespace "kafka". Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=secrets, name=kafka-cluster-ca-cert, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=secrets "kafka-cluster-ca-cert" is forbidden: User "system:anonymous" cannot get resource "secrets" in API group "" in the namespace "kafka", metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Forbidden, status=Failure, additionalProperties={}).
I have verified that the pod has the correct service account (and the correct rolebinding to a role which has this permission). Any idea why the user comes as anonymous here and not the service account it is supposed to run with?
Beta Was this translation helpful? Give feedback.
All reactions