Issues trying Strimzi on GKE-auto #6111

ThorbenJ · 2021-12-27T11:44:38Z

ThorbenJ
Dec 27, 2021

Hi.

I followed the getting started guide to deploy Strimzi on a GKE Autopilot cluster (1.21.5-gke.1302); for which I also followed the getting started guide.

I first deployed 0.26.1, then tried to upgrade to 0.27.0, and then deleted everything and started again deploying 0.27.0 - and now I am here asking for help. Note: I deployed strimzi into namespace 'strimzi' and Kafka into 'kafka'.

I see two big issues:

1) The abc-entity-operator deployment is failing and is creating an ever longer list of ReplicaSets:

$kc get replicasets -n kafka
NAME                                     DESIRED   CURRENT   READY   AGE
abc-entity-operator-546677b6cd   0         0         0       8m9s
abc-entity-operator-57c7f7dc46   0         0         0       4m19s
abc-entity-operator-586c79b6d5   0         0         0       9m9s
abc-entity-operator-5ccdf9dc49   0         0         0       5m19s
abc-entity-operator-5d767cb6d8   0         0         0       9m39s
<< 20 lines cut >>
abc-entity-operator-bc4c77d98    0         0         0       10m
abc-entity-operator-fbdf5dbff    0         0         0       3m19s
<< a couple minutes later >>
$ kc get replicasets -n kafka
NAME                                     DESIRED   CURRENT   READY   AGE
abc-entity-operator-546677b6cd   0         0         0       14m
abc-entity-operator-5764965567   0         0         0       4m25s
abc-entity-operator-57c7f7dc46   0         0         0       10m
abc-entity-operator-586c79b6d5   0         0         0       15m
<< 35 lines cut >>
abc-entity-operator-f59945564    0         0         0       3m35s
abc-entity-operator-fbdf5dbff    0         0         0       9m45s

I assume this is not intentional. When I was deleting everything to start again, I had nearly 8000 of these to delete...

In the GCP webui console the reason I most often see for the error state is Container tls-sidecar is waiting, and I managed to grab this:

2) It keeps deleting (and thus restarting) both Zookeeper and Kafka nodes for no apparent reason.

Looking at Zookeeper and Kafka they seem happy, I can even do basic operations like interact with topics:

$ kc exec -n kafka abc-zookeeper-0 -- bin/kafka-topics.sh --bootstrap-server abc-kafka-bootstrap:9092 --list
__consumer_offsets
__strimzi-topic-operator-kstreams-topic-store-changelog
__strimzi_store_topic

Here are the operator logs:
strimzi_operator.log

What am I doing wrong? Please advise what can I do?

scholzj · 2021-12-27T14:00:12Z

scholzj
Dec 27, 2021
Maintainer

I'm not sure I follow the problem. You tal about some cluster named abc, but the log does not show such cluster, onl a cluster named elsec-feeds. So it is not completely clear how do they relate. The ZooKeeper and Kafka pods are rolling because there seem to be some changes. When you change the log level to debug, it should show more details about what the changes are.

0 replies

ThorbenJ · 2021-12-28T08:51:59Z

ThorbenJ
Dec 28, 2021
Author

Hi. Thank you for replying.
The name mis-match is my fault: When I was originally writing this up as an issue, before deciding to post in the discussion forums instead, I decided to neutralise the pastes with an example string (abc, xyz, etc.); but then forgot to s/elsec_feeds/abc/g the log snippet; they are both the same cluster/installation.

Its still creating replicasets like crazy for the entity-operator:

$ kc get replicasets -n kafka | wc -l
    2588

^Those are all elsec-feeds-entity-operator-...

On a side note: why is it in the 'kafka' NS (where I have zk and kf) and not in the 'strimzi' NS where the cluster operator lives? (I don't know if that is another symptom or not)

After over a day the entity-operator is still not created/running - no idea why.

As the the rolling restarts of zk and kf, I will restart the cluster operator with debug logging - however note nothing is interacting with this environment yet, I only just created it. So what is the change that is causing the infinite restarts?
I'll post the cluster operator debug logs once I've captured some.

0 replies

ThorbenJ · 2021-12-28T11:08:25Z

ThorbenJ
Dec 28, 2021
Author

So I decided to delete is all and start again (following: https://strimzi.io/docs/operators/latest/quickstart.html).

For reference:

diff -u ../download/strimzi-0.27.0/install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml
--- ../download/strimzi-0.27.0/install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml	2021-12-24 12:15:20.000000000 +0100
+++ install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml	2021-12-27 10:01:58.000000000 +0100
@@ -7,7 +7,7 @@
 subjects:
   - kind: ServiceAccount
     name: strimzi-cluster-operator
-    namespace: myproject
+    namespace: strimzi
 roleRef:
   kind: ClusterRole
   name: strimzi-cluster-operator-namespaced
diff -u ../download/strimzi-0.27.0/install/cluster-operator/021-ClusterRoleBinding-strimzi-cluster-operator.yaml install/cluster-operator/021-ClusterRoleBinding-strimzi-cluster-operator.yaml
--- ../download/strimzi-0.27.0/install/cluster-operator/021-ClusterRoleBinding-strimzi-cluster-operator.yaml	2021-12-24 12:15:20.000000000 +0100
+++ install/cluster-operator/021-ClusterRoleBinding-strimzi-cluster-operator.yaml	2021-12-27 10:01:58.000000000 +0100
@@ -7,7 +7,7 @@
 subjects:
   - kind: ServiceAccount
     name: strimzi-cluster-operator
-    namespace: myproject
+    namespace: strimzi
 roleRef:
   kind: ClusterRole
   name: strimzi-cluster-operator-global
diff -u ../download/strimzi-0.27.0/install/cluster-operator/030-ClusterRoleBinding-strimzi-cluster-operator-kafka-broker-delegation.yaml install/cluster-operator/030-ClusterRoleBinding-strimzi-cluster-operator-kafka-broker-delegation.yaml
--- ../download/strimzi-0.27.0/install/cluster-operator/030-ClusterRoleBinding-strimzi-cluster-operator-kafka-broker-delegation.yaml	2021-12-24 12:15:20.000000000 +0100
+++ install/cluster-operator/030-ClusterRoleBinding-strimzi-cluster-operator-kafka-broker-delegation.yaml	2021-12-27 10:01:58.000000000 +0100
@@ -9,7 +9,7 @@
 subjects:
   - kind: ServiceAccount
     name: strimzi-cluster-operator
-    namespace: myproject
+    namespace: strimzi
 roleRef:
   kind: ClusterRole
   name: strimzi-kafka-broker
diff -u ../download/strimzi-0.27.0/install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml
--- ../download/strimzi-0.27.0/install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml	2021-12-24 12:15:20.000000000 +0100
+++ install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml	2021-12-27 10:01:58.000000000 +0100
@@ -9,7 +9,7 @@
 subjects:
   - kind: ServiceAccount
     name: strimzi-cluster-operator
-    namespace: myproject
+    namespace: strimzi
 roleRef:
   kind: ClusterRole
   name: strimzi-entity-operator
diff -u ../download/strimzi-0.27.0/install/cluster-operator/033-ClusterRoleBinding-strimzi-cluster-operator-kafka-client-delegation.yaml install/cluster-operator/033-ClusterRoleBinding-strimzi-cluster-operator-kafka-client-delegation.yaml
--- ../download/strimzi-0.27.0/install/cluster-operator/033-ClusterRoleBinding-strimzi-cluster-operator-kafka-client-delegation.yaml	2021-12-24 12:15:20.000000000 +0100
+++ install/cluster-operator/033-ClusterRoleBinding-strimzi-cluster-operator-kafka-client-delegation.yaml	2021-12-27 10:01:58.000000000 +0100
@@ -10,7 +10,7 @@
 subjects:
   - kind: ServiceAccount
     name: strimzi-cluster-operator
-    namespace: myproject
+    namespace: strimzi
 roleRef:
   kind: ClusterRole
   name: strimzi-kafka-client
diff -u ../download/strimzi-0.27.0/install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml
--- ../download/strimzi-0.27.0/install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml	2021-12-24 12:15:20.000000000 +0100
+++ install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml	2021-12-28 10:28:35.000000000 +0100
@@ -40,9 +40,9 @@
               mountPath: /opt/strimzi/custom-config/
           env:
             - name: STRIMZI_NAMESPACE
-              valueFrom:
-                fieldRef:
-                  fieldPath: metadata.namespace
+              value: kafka
+            - name: STRIMZI_LOG_LEVEL
+              value: DEBUG
             - name: STRIMZI_FULL_RECONCILIATION_INTERVAL_MS
               value: "120000"
             - name: STRIMZI_OPERATION_TIMEOUT_MS

Then

$ kc create ns strimzi
namespace/strimzi created
$ kc create ns kafka
namespace/kafka created
$ kc create -n strimzi -f install/cluster-operator/
serviceaccount/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-namespaced created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-global created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-kafka-broker created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-kafka-broker-delegation created
clusterrole.rbac.authorization.k8s.io/strimzi-entity-operator created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-entity-operator-delegation created
clusterrole.rbac.authorization.k8s.io/strimzi-kafka-client created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-kafka-client-delegation created
customresourcedefinition.apiextensions.k8s.io/kafkas.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkaconnects.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/strimzipodsets.core.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkatopics.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkausers.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkamirrormakers.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkabridges.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkaconnectors.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkamirrormaker2s.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkarebalances.kafka.strimzi.io created
configmap/strimzi-cluster-operator created
deployment.apps/strimzi-cluster-operator created
$ kc create -n kafka -f install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml 
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
$ kc create -n kafka -f install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml 
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-entity-operator-delegation created

Then the cluster:

$ cat elsec-feeds.yml 
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: elsec-feeds
spec:
  kafka:
    version: 3.0.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
#      transaction.state.log.min.isr: 2
#      log.message.format.version: "3.0"
      inter.broker.protocol.version: "3.0"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
#      - id: 1
#        type: persistent-claim
#        size: 100Gi
#        deleteClaim: false
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

=>

$ kc create -n kafka -f elsec-feeds.yml 
kafka.kafka.strimzi.io/elsec-feeds created

So notes on what's happening in between:

~10:09 GMT Autopilot appears to have finished scaling up resources for zk and kf (https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview)
~10:25 GMT elsec-feeds-zookeeper-1 remained "Unschedulable" so I kicked it by deleting the pod (this work as then deployed ok)
~10:35 GMT all 3 KF and 3 ZK pods were up (but still be continuously reboot by strimzi), so I proceeded to create a topic

Topic:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: tj-test
  labels:
    strimzi.io/cluster: elsec-feeds
spec:
  partitions: 1
  replicas: 1
  config:
# 1hr = 3600000ms / 2hr = 7200000ms
    retention.ms: 7200000
    segment.bytes: 1073741824

=>

$ kc create -n kafka -f topics/test-topic.yml 
kafkatopic.kafka.strimzi.io/tj-test created

~10:50 GMT

I was going to say: It seems this time the entity-operator came up ok, its not creating loads of empty/dead replicasets:

$ kc get replicasets -n kafka
NAME                                     DESIRED   CURRENT   READY   AGE
elsec-feeds-entity-operator-6d5dcb784c   1         1         1       8m1s

However before I finished to post this, I looked again and:

$ kc get replicasets -n kafka
NAME                                     DESIRED   CURRENT   READY   AGE
elsec-feeds-entity-operator-546677b6cd   0         0         0       62s
elsec-feeds-entity-operator-586c79b6d5   0         0         0       2m2s
elsec-feeds-entity-operator-5d767cb6d8   0         0         0       2m32s
elsec-feeds-entity-operator-6495d885f5   0         0         0       4m2s
elsec-feeds-entity-operator-65d7fcdc4d   0         0         0       42s
elsec-feeds-entity-operator-69ddffdc55   0         0         0       72s
elsec-feeds-entity-operator-6b9c57d95    0         0         0       2m22s
elsec-feeds-entity-operator-6d5dcb784c   0         0         0       16m
elsec-feeds-entity-operator-6ff845dc58   0         0         0       2m12s
elsec-feeds-entity-operator-77ff47dc5c   0         0         0       2m42s
elsec-feeds-entity-operator-7c94b85585   0         0         0       51s
elsec-feeds-entity-operator-8599bb5588   0         0         0       82s
elsec-feeds-entity-operator-b5c75b6c9    0         0         0       2s
elsec-feeds-entity-operator-bc4c77d98    0         0         0       3m22s

So back to square 1 on my first issue...

For the brief time it was running it did create my test topic:

$ kc exec -n kafka elsec-feeds-kafka-0 -- bin/kafka-topics.sh --bootstrap-server elsec-feeds-kafka-bootstrap:9092 --list
__consumer_offsets
__strimzi-topic-operator-kstreams-topic-store-changelog
__strimzi_store_topic
tj-test

Also after all that, it is still deleting/restarting the kf and zk pods... so my second issue is also here to stay..

I hope with all this you or anyone else has enough information to help - it would be so greatly appreciated.

I was capturing the debug cluster operator logs (stopped log capture at 11:04 GMT):
strimzi-2021-12-28.log.gz

In any case, thank you for taking the time to look at this.

1 reply

scholzj Dec 28, 2021
Maintainer

It looks like some tooling in your cluster is injecting all kind of additional information and modifying the statefulsets directly:

2021-12-28 09:47:43 DEBUG AbstractResourceOperator:115 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka already exists, patching it
2021-12-28 09:47:43 DEBUG StatefulSetDiff:103 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka differs: {"op":"remove","path":"/metadata/annotations/autopilot.gke.io~1resource-adjustment"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:104 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): Current StatefulSet path /metadata/annotations/autopilot.gke.io~1resource-adjustment has value 
2021-12-28 09:47:43 DEBUG StatefulSetDiff:105 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): Desired StatefulSet path /metadata/annotations/autopilot.gke.io~1resource-adjustment has value 
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/metadata/managedFields"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/revisionHistoryLimit"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/metadata/annotations/strimzi.io~1generation"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/livenessProbe/failureThreshold"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/livenessProbe/periodSeconds"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/livenessProbe/successThreshold"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/readinessProbe/failureThreshold"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/readinessProbe/periodSeconds"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/readinessProbe/successThreshold"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/resources"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:103 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka differs: {"op":"remove","path":"/spec/template/spec/containers/0/securityContext"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:104 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): Current StatefulSet path /spec/template/spec/containers/0/securityContext has value {"capabilities":{"drop":["NET_RAW"]}}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:105 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): Desired StatefulSet path /spec/template/spec/containers/0/securityContext has value 
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/terminationMessagePath"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/containers/0/terminationMessagePolicy"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/dnsPolicy"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/restartPolicy"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:103 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka differs: {"op":"remove","path":"/spec/template/spec/securityContext/seccompProfile"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:104 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): Current StatefulSet path /spec/template/spec/securityContext/seccompProfile has value {"type":"RuntimeDefault"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:105 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): Desired StatefulSet path /spec/template/spec/securityContext/seccompProfile has value 
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/serviceAccount"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/template/spec/volumes/4/configMap/defaultMode"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/volumeClaimTemplates/0/spec/volumeMode"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/spec/volumeClaimTemplates/0/status"}
2021-12-28 09:47:43 DEBUG StatefulSetDiff:86 - Reconciliation #8(watch) Kafka(kafka/elsec-feeds): StatefulSet kafka/elsec-feeds-kafka ignoring diff {"op":"remove","path":"/status"}

It modifies the SecurityContext or annotations. That is forcing the rolling updates of Kafka and ZooKeeper and also generating the new replica sets for the Deployments. You have two options how to deal with it:

Disable whatever is doing this from doing it
Use the template section of the Kafka custom resource to add these options directly to the resources created by Strimzi. See more information in the docs: https://strimzi.io/docs/operators/latest/full/using.html#assembly-customizing-kubernetes-resources-str

ThorbenJ · 2021-12-28T13:05:29Z

ThorbenJ
Dec 28, 2021
Author

Thank you for looking at this and for the response.

If I understand correctly Strimzi and Autopilot are not getting along?
https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview

This how I created the cluster:

gcloud container clusters create-auto my-cluster
gcloud container clusters get-credentials my-cluster

Following: https://cloud.google.com/kubernetes-engine/docs/quickstart#create_cluster - After those two commands I can use kubectl.

Then I moved onto the Strimzi getting started - I did nothing more at the cluster create part (yet).

So the first option "Disable whatever is doing this from doing it" says to me: You can't use GKE-Autopilot

The second option is (TBH) starting to take me out of my depth, I will have to take some time to digest it. I use to run ZK and KF managed via my own Ansible playbook. Having discovered Strimzi I thought I'd give it a go.

Should/Could I open an issue for out-of-the-box support of Strimzi on GKE-Autopilot?

Will the template route really work if Autopilot is making annotations? Can I really know such in advanced and add them to strimzi definitions? Would it not be better to be able to list things that Strimzi should ignore, by adopting the new value automatically?

1 reply

scholzj Dec 28, 2021
Maintainer

TBH, I don't know what Autopilot is and how exactly it works. So I cannot easily comment on it. The autopilot.gke.io/resource-adjustment annotation clearly comes from it. But I'm not sure that alone would be the cause of the problems as that alone does not annotate the pods. The changes to the diff done to the security context certainly would. I'm not sure those are related to the auto-pilot as well or to some other tooling you might use.

In general, it looks like you have multiple components in your cluster which want to manage the same resources and want them differently configured. And that does not work because they would fight over it and changing it in a circular loop. Using the template section of the Kafka custom resource which I linked above can help to make sure that the different components managing the same resources such as stateful sets or deployments want it configured the same and that might stop the fight.

Should/Could I open an issue for out-of-the-box support of Strimzi on GKE-Autopilot?

Well, you use the auto-pilot ... so if you wanna help to make it work out of the box it, we can work with you on it. But we do now know the auto-pilot feature, so not sure just opening an issue which says make it work would be helpful on its own.

Will the template route really work if Autopilot is making annotations? Can I really know such in advanced and add them to strimzi definitions?

You would certainly need to try it out, I do not know it for sure. You are using the auto-pilot tooling, so you should probably know what it does and how it works. You will probably need to understand that even if you want to deploy and manage Kafka your self.

Would it not be better to be able to list things that Strimzi should ignore, by adopting the new value automatically?

That is not so easy as it sounds. You cannot use Strimzi to declaratively manage things while also asking it to not manage things declaratively. You have a Kafka custom resource which says how the Kafka cluster should look like. So it creates the Kafka cluster to look like it.

Also, from my point of view, the tooling should not modify the StatefulSets and Deployments to add the security context to them. It should instead use admission controllers as other similar tools do. But I guess you might not be able to affect that.

ThorbenJ · 2021-12-31T10:00:26Z

ThorbenJ
Dec 31, 2021
Author

Thank you for your continuing support in this discussion thread.

Autopilot is a new mode or type of GKE (Googles K8s platform), that they released in Feb. 2021. The blog/announcement is here: https://cloud.google.com/blog/products/containers-kubernetes/introducing-gke-autopilot.

From the overview documentation: https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview -

With Autopilot, you no longer have to monitor the health of your nodes or calculate the amount of compute capacity that your workloads require. Autopilot supports most Kubernetes APIs, tools, and its rich ecosystem. You stay within GKE without having to interact with the Compute Engine APIs, CLIs, or UI, as the nodes are not accessible through Compute Engine, like they are in Standard mode. You pay only for the CPU, memory, and storage that your Pods request while they are running.

Autopilot clusters are pre-configured with an optimized cluster configuration that is ready for production workloads. This streamlined configuration follows GKE best practices and recommendations for cluster and workload setup and security. Some of these built-in settings (detailed in the table below) are immutable and other optional settings can be turned on or off.

Here's a comparison between Autopilot and Standard GKE: https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#comparison - Looking at that table I guess its something that Autopilot pre-configures and manages, such as the things listed under Security.

Perhaps I have to deploy a standard GKE, but I was really hoping to spare time and effort by having Google and their automation managed the GKE infrastructure, such as nodes - and in turn have Strimzi manage the application platform: Kafka. Leaving me to do other things.

Routes I now see, and will investigate:

Try to find all the things that Autopilot add, and add them to the Strimzi templates
See if its possible to identify the component that Autopilot sets up, that conflicts with Strimzi, and disable it
Move to GKE Standard (not at all keen on this)
Give up

At the very least I hope this thread proves useful to others who try Strimzi with GKE Autopilot. - The dream of deploying a fresh GKE, Strimzi and Kafka cluster in something like a dozen simple commands.

Happy new year!
-Thorben

0 replies

ThorbenJ · 2021-12-31T13:58:06Z

ThorbenJ
Dec 31, 2021
Author

Hi.

So I now have this:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: elsec-feeds
spec:
  kafka:
    version: 3.0.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
#      transaction.state.log.min.isr: 2
#      log.message.format.version: "3.0"
      inter.broker.protocol.version: "3.0"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
#      - id: 1
#        type: persistent-claim
#        size: 100Gi
#        deleteClaim: false
    template:
#      statefulset:
       kafkaContainer:
         securityContext:
           capabilities: {"drop":["NET_RAW"]}
           seccompProfile: {"type":"RuntimeDefault"}
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: false
    template:
#      statefulset:
       zookeeperContainer:
         securityContext:
           capabilities: {"drop":["NET_RAW"]}
           seccompProfile: {"type":"RuntimeDefault"}
  entityOperator:
    topicOperator: {}
    userOperator: {}
    template:
       topicOperatorContainer:
         securityContext:
           capabilities: {"drop":["NET_RAW"]}
           seccompProfile: {"type":"RuntimeDefault"}
       userOperatorContainer:
         securityContext:
           capabilities: {"drop":["NET_RAW"]}
           seccompProfile: {"type":"RuntimeDefault"}
       tlsSidecarContainer:
         securityContext:
           capabilities: {"drop":["NET_RAW"]}
           seccompProfile: {"type":"RuntimeDefault"}

This appears to have stopped the delete/restart loop for Zookeeper and Kafka; however the entity operator still wont come up.

I tried to grab some log history via the GCP web ui, attached here:
downloaded-logs-20211231-145123.csv
downloaded-logs-20211231-145318.csv

1 reply

scholzj Dec 31, 2021
Maintainer

So, what exactly does it mean with does not come up? Is the pod crashlooping? Or does it not show up at all? Also, can you provide the logs in some more readable format? It looks like what you provided is kind of mixing everything together.

ThorbenJ · 2022-01-03T12:26:10Z

ThorbenJ
Jan 3, 2022
Author

Hi. Is this better? I am finding it difficult to get logs that look useful for something that is so short lived; any advice welcome.

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c topic-operator -f
Error from server (BadRequest): container "topic-operator" in pod "elsec-feeds-entity-operator-787fbb4787-4wmdj" is waiting to start: ContainerCreating

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c tls-sidecar -f
Starting Stunnel with configuration:
pid = /tmp/stunnel.pid
foreground = yes
debug = notice
sslVersion = TLSv1.2
[zookeeper-2181]
client = yes
CAfile = /tmp/cluster-ca.crt
cert = /etc/tls-sidecar/eo-certs/entity-operator.crt
key = /etc/tls-sidecar/eo-certs/entity-operator.key
accept = 127.0.0.1:2181
connect = elsec-feeds-zookeeper-client:2181
delay = yes
verify = 2


+ exec /usr/bin/tini -w -e 143 -- /usr/bin/stunnel /tmp/stunnel.conf
2022.01.03 10:37:51 LOG5[ui]: stunnel 5.56 on x86_64-redhat-linux-gnu platform
2022.01.03 10:37:51 LOG5[ui]: Compiled with OpenSSL 1.1.1g FIPS  21 Apr 2020
2022.01.03 10:37:51 LOG5[ui]: Running  with OpenSSL 1.1.1k  FIPS 25 Mar 2021
2022.01.03 10:37:51 LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,FIPS,OCSP,PSK,SNI
2022.01.03 10:37:51 LOG5[ui]: Reading configuration from file /tmp/stunnel.conf
2022.01.03 10:37:51 LOG5[ui]: UTF-8 byte order mark not detected
2022.01.03 10:37:51 LOG5[ui]: FIPS mode disabled
2022.01.03 10:37:51 LOG4[ui]: Insecure file permissions on /etc/tls-sidecar/eo-certs/entity-operator.key
2022.01.03 10:37:51 LOG4[ui]: Service [zookeeper-2181] uses "verifyChain" without subject checks
2022.01.03 10:37:51 LOG4[ui]: Use "checkHost" or "checkIP" to restrict trusted certificates
2022.01.03 10:37:51 LOG5[ui]: Configuration successful
2022.01.03 10:38:04 LOG5[ui]: Terminated
2022.01.03 10:38:04 LOG5[ui]: Terminating 1 service thread(s)
2022.01.03 10:38:04 LOG5[ui]: Service threads terminated
rpc error: code = NotFound desc = an error occurred when try to find container "3b8b3ae6c1ba1bb98b76f10f89074d440783624a49ef7af60b686fb5953c9c00": not found

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c tls-sidecar -f
error: timed out waiting for the condition

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c tls-sidecar -f
Error from server (BadRequest): container "tls-sidecar" in pod "elsec-feeds-entity-operator-84d545cdc8-flf94" is waiting to start: ContainerCreating

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c tls-sidecar -f
Starting Stunnel with configuration:
pid = /tmp/stunnel.pid
foreground = yes
debug = notice
sslVersion = TLSv1.2
[zookeeper-2181]
client = yes
CAfile = /tmp/cluster-ca.crt
cert = /etc/tls-sidecar/eo-certs/entity-operator.crt
key = /etc/tls-sidecar/eo-certs/entity-operator.key
accept = 127.0.0.1:2181
connect = elsec-feeds-zookeeper-client:2181
delay = yes
verify = 2


+ exec /usr/bin/tini -w -e 143 -- /usr/bin/stunnel /tmp/stunnel.conf
2022.01.03 10:38:38 LOG5[ui]: stunnel 5.56 on x86_64-redhat-linux-gnu platform
2022.01.03 10:38:38 LOG5[ui]: Compiled with OpenSSL 1.1.1g FIPS  21 Apr 2020
2022.01.03 10:38:38 LOG5[ui]: Running  with OpenSSL 1.1.1k  FIPS 25 Mar 2021
2022.01.03 10:38:38 LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,FIPS,OCSP,PSK,SNI
2022.01.03 10:38:38 LOG5[ui]: Reading configuration from file /tmp/stunnel.conf
2022.01.03 10:38:38 LOG5[ui]: UTF-8 byte order mark not detected
2022.01.03 10:38:38 LOG5[ui]: FIPS mode disabled
2022.01.03 10:38:38 LOG4[ui]: Insecure file permissions on /etc/tls-sidecar/eo-certs/entity-operator.key
2022.01.03 10:38:38 LOG4[ui]: Service [zookeeper-2181] uses "verifyChain" without subject checks
2022.01.03 10:38:38 LOG4[ui]: Use "checkHost" or "checkIP" to restrict trusted certificates
2022.01.03 10:38:38 LOG5[ui]: Configuration successful
2022.01.03 10:38:50 LOG5[0]: Service [zookeeper-2181] accepted connection from 127.0.0.1:60356
2022.01.03 10:38:50 LOG5[0]: s_connect: connected 10.20.130.255:2181
2022.01.03 10:38:50 LOG5[0]: Service [zookeeper-2181] connected remote server from 10.20.1.5:34478
2022.01.03 10:38:50 LOG5[0]: Certificate accepted at depth=0: O=io.strimzi, CN=elsec-feeds-zookeeper
2022.01.03 10:38:50 LOG5[0]: Connection closed: 81 byte(s) sent to TLS, 61 byte(s) sent to socket
2022.01.03 10:38:51 LOG5[ui]: Terminated
2022.01.03 10:38:51 LOG5[ui]: Terminating 1 service thread(s)
2022.01.03 10:38:51 LOG5[ui]: Service threads terminated
rpc error: code = NotFound desc = an error occurred when try to find container "7f2d9a363429e333f92ca9de538200525f06ecfc47d1fbf6838df82092d90e7d": not found

and

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c topic-operator -f
Preparing trust store certificates for internal communication
Adding /etc/tls-sidecar/cluster-ca-certs/ca.crt to truststore /tmp/topic-operator/replication.truststore.p12 with alias ca
Certificate was added to keystore
Preparing trust store certificates for internal communication is completed
Preparing key store certificates for internal communication
Preparing key store certificates for internal communication is completed
+ exec /usr/bin/tini -w -e 143 -- java -Dlog4j2.configurationFile=file:/opt/topic-operator/custom-config/log4j2.properties -Dvertx.cacheDirBase=/tmp/vertx-cache -Djava.security.egd=file:/dev/./urandom --illegal-access=deny -classpath lib/io.strimzi.topic-operator-0.27.0.jar:lib/io.netty.netty-common-4.1.71.Final.jar:lib/io.prometheus.simpleclient_common-0.7.0.jar:lib/io.apicurio.apicurio-registry-utils-streams-1.3.2.Final.jar:lib/io.netty.netty-buffer-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-scheduling-5.10.1.jar:lib/io.fabric8.openshift-model-operatorhub-5.10.1.jar:lib/com.101tec.zkclient-0.11.jar:lib/com.squareup.okio.okio-1.15.0.jar:lib/io.apicurio.apicurio-registry-utils-kafka-1.3.2.Final.jar:lib/io.fabric8.kubernetes-model-extensions-5.10.1.jar:lib/io.fabric8.kubernetes-model-discovery-5.10.1.jar:lib/io.netty.netty-codec-http2-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-events-5.10.1.jar:lib/org.apache.logging.log4j.log4j-core-2.17.0.jar:lib/org.apache.kafka.kafka-clients-3.0.0.jar:lib/io.fabric8.kubernetes-model-core-5.10.1.jar:lib/jakarta.annotation.jakarta.annotation-api-1.3.5.jar:lib/org.hdrhistogram.HdrHistogram-2.1.11.jar:lib/com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:lib/io.prometheus.simpleclient-0.7.0.jar:lib/io.fabric8.openshift-model-5.10.1.jar:lib/io.fabric8.kubernetes-model-networking-5.10.1.jar:lib/com.squareup.okhttp3.logging-interceptor-3.12.12.jar:lib/org.apache.yetus.audience-annotations-0.5.0.jar:lib/io.fabric8.kubernetes-model-certificates-5.10.1.jar:lib/io.fabric8.kubernetes-model-node-5.10.1.jar:lib/io.strimzi.crd-annotations-0.27.0.jar:lib/com.fasterxml.jackson.core.jackson-annotations-2.11.3.jar:lib/io.fabric8.kubernetes-model-common-5.10.1.jar:lib/io.fabric8.openshift-model-miscellaneous-5.10.1.jar:lib/org.apache.zookeeper.zookeeper-jute-3.6.3.jar:lib/io.netty.netty-handler-4.1.71.Final.jar:lib/io.micrometer.micrometer-core-1.3.1.jar:lib/io.fabric8.kubernetes-model-apiextensions-5.10.1.jar:lib/io.fabric8.kubernetes-client-5.10.1.jar:lib/io.netty.netty-codec-4.1.71.Final.jar:lib/io.fabric8.openshift-model-whereabouts-5.10.1.jar:lib/org.apache.zookeeper.zookeeper-3.6.3.jar:lib/io.fabric8.openshift-model-machineconfig-5.10.1.jar:lib/com.fasterxml.jackson.dataformat.jackson-dataformat-yaml-2.10.5.jar:lib/io.fabric8.kubernetes-model-metrics-5.10.1.jar:lib/io.netty.netty-codec-http-4.1.71.Final.jar:lib/io.fabric8.openshift-model-storageversionmigrator-5.10.1.jar:lib/com.github.mifmif.generex-1.0.2.jar:lib/io.strimzi.operator-common-0.27.0.jar:lib/org.xerial.snappy.snappy-java-1.1.8.1.jar:lib/io.fabric8.kubernetes-model-apps-5.10.1.jar:lib/com.fasterxml.jackson.core.jackson-core-2.11.3.jar:lib/org.yaml.snakeyaml-1.26.jar:lib/org.apache.kafka.kafka-streams-3.0.0.jar:lib/io.fabric8.kubernetes-model-admissionregistration-5.10.1.jar:lib/io.netty.netty-resolver-dns-4.1.71.Final.jar:lib/io.fabric8.openshift-model-monitoring-5.10.1.jar:lib/io.fabric8.kubernetes-model-storageclass-5.10.1.jar:lib/io.apicurio.apicurio-registry-common-1.3.2.Final.jar:lib/org.lz4.lz4-java-1.7.1.jar:lib/io.fabric8.openshift-model-operator-5.10.1.jar:lib/dk.brics.automaton.automaton-1.11-8.jar:lib/io.netty.netty-codec-socks-4.1.71.Final.jar:lib/org.eclipse.microprofile.config.microprofile-config-api-1.4.jar:lib/org.rocksdb.rocksdbjni-6.19.3.jar:lib/io.netty.netty-handler-proxy-4.1.71.Final.jar:lib/io.netty.netty-resolver-4.1.71.Final.jar:lib/io.fabric8.openshift-model-clusterautoscaling-5.10.1.jar:lib/io.strimzi.certificate-manager-0.27.0.jar:lib/io.fabric8.kubernetes-model-flowcontrol-5.10.1.jar:lib/io.fabric8.zjsonpatch-0.3.0.jar:lib/io.fabric8.kubernetes-model-coordination-5.10.1.jar:lib/io.netty.netty-transport-classes-epoll-4.1.71.Final.jar:lib/org.jboss.spec.javax.ws.rs.jboss-jaxrs-api_2.1_spec-2.0.1.Final.jar:lib/org.apache.logging.log4j.log4j-api-2.17.0.jar:lib/io.vertx.vertx-micrometer-metrics-4.2.1.jar:lib/io.fabric8.openshift-model-tuned-5.10.1.jar:lib/com.squareup.okhttp3.okhttp-3.12.6.jar:lib/io.strimzi.api-0.27.0.jar:lib/org.apache.logging.log4j.log4j-slf4j-impl-2.17.0.jar:lib/io.fabric8.kubernetes-model-batch-5.10.1.jar:lib/io.netty.netty-codec-dns-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-autoscaling-5.10.1.jar:lib/io.fabric8.openshift-client-5.10.1.jar:lib/io.fabric8.openshift-model-console-5.10.1.jar:lib/org.slf4j.slf4j-api-1.7.25.jar:lib/io.netty.netty-transport-native-unix-common-4.1.71.Final.jar:lib/io.fabric8.openshift-model-machine-5.10.1.jar:lib/org.latencyutils.LatencyUtils-2.0.3.jar:lib/io.netty.netty-transport-native-epoll-4.1.71.Final.jar:lib/com.github.luben.zstd-jni-1.5.0-2.jar:lib/com.fasterxml.jackson.datatype.jackson-datatype-jsr310-2.13.0.jar:lib/io.netty.netty-tcnative-classes-2.0.46.Final.jar:lib/io.netty.netty-transport-4.1.71.Final.jar:lib/io.vertx.vertx-core-4.2.1.jar:lib/io.micrometer.micrometer-registry-prometheus-1.3.1.jar:lib/io.fabric8.kubernetes-model-rbac-5.10.1.jar:lib/io.fabric8.kubernetes-model-policy-5.10.1.jar io.strimzi.operator.topic.Main
2022-01-03 10:42:53,22338 INFO  [main] Main:29 - TopicOperator 0.27.0 is starting
2022-01-03 10:42:57,53660 INFO  [main] Session:77 - Using config:
	STRIMZI_TRUSTSTORE_LOCATION: /tmp/topic-operator/replication.truststore.p12
	STRIMZI_RESOURCE_LABELS: strimzi.io/cluster=elsec-feeds
	STRIMZI_SASL_MECHANISM: 
	STRIMZI_SASL_USERNAME: 
	STRIMZI_FULL_RECONCILIATION_INTERVAL_MS: 120000
	STRIMZI_CLIENT_ID: strimzi-topic-operator-c3357743-c5cc-4c9b-b2b5-ee4e9b340dab
	STRIMZI_SECURITY_PROTOCOL: SSL
	STRIMZI_STALE_RESULT_TIMEOUT_MS: 5000
	STRIMZI_TOPIC_METADATA_MAX_ATTEMPTS: 6
	STRIMZI_KEYSTORE_LOCATION: /tmp/topic-operator/replication.keystore.p12
	STRIMZI_REASSIGN_THROTTLE: 9223372036854775807
	STRIMZI_USE_ZOOKEEPER_TOPIC_STORE: false
	STRIMZI_SASL_PASSWORD: ********
	STRIMZI_KAFKA_BOOTSTRAP_SERVERS: elsec-feeds-kafka-bootstrap:9091
	STRIMZI_SASL_ENABLED: false
	STRIMZI_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: HTTPS
	STRIMZI_NAMESPACE: kafka
	STRIMZI_APPLICATION_ID: __strimzi-topic-operator-kstreams
	STRIMZI_ZOOKEEPER_SESSION_TIMEOUT_MS: 18000
	STRIMZI_TOPICS_PATH: /strimzi/topics
	STRIMZI_ZOOKEEPER_CONNECT: localhost:2181
	STRIMZI_TLS_ENABLED: true
	STRIMZI_KEYSTORE_PASSWORD: ********
	STRIMZI_STORE_NAME: topic-store
	STRIMZI_REASSIGN_VERIFY_INTERVAL_MS: 120000
	TC_ZK_CONNECTION_TIMEOUT_MS: 18000
	STRIMZI_TRUSTSTORE_PASSWORD: ********
	STRIMZI_STORE_TOPIC: __strimzi_store_topic

2022-01-03 10:42:57,62122 INFO  [vert.x-eventloop-thread-0] Session:155 - Starting
2022-01-03 10:42:58,96008 INFO  [vert.x-eventloop-thread-0] AppInfoParser:119 - Kafka version: 3.0.0
2022-01-03 10:42:58,96378 INFO  [vert.x-eventloop-thread-0] AppInfoParser:120 - Kafka commitId: 8cb0a5e9d3441962
2022-01-03 10:42:58,96861 INFO  [vert.x-eventloop-thread-0] AppInfoParser:121 - Kafka startTimeMs: 1641206578946
2022-01-03 10:42:59,22666 INFO  [ZkClient-EventThread-20-localhost:2181] ZkEventThread:65 - Starting ZkClient event thread.
2022-01-03 10:42:59,24209 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:zookeeper.version=3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
2022-01-03 10:42:59,24286 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:host.name=elsec-feeds-entity-operator-787fbb4787-4wmdj
2022-01-03 10:42:59,24309 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:java.version=11.0.13
2022-01-03 10:42:59,24411 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:java.vendor=Red Hat, Inc.
2022-01-03 10:42:59,24938 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:java.home=/usr/lib/jvm/java-11-openjdk-11.0.13.0.8-4.el8_5.x86_64
2022-01-03 10:42:59,25002 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:java.class.path=lib/io.strimzi.topic-operator-0.27.0.jar:lib/io.netty.netty-common-4.1.71.Final.jar:lib/io.prometheus.simpleclient_common-0.7.0.jar:lib/io.apicurio.apicurio-registry-utils-streams-1.3.2.Final.jar:lib/io.netty.netty-buffer-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-scheduling-5.10.1.jar:lib/io.fabric8.openshift-model-operatorhub-5.10.1.jar:lib/com.101tec.zkclient-0.11.jar:lib/com.squareup.okio.okio-1.15.0.jar:lib/io.apicurio.apicurio-registry-utils-kafka-1.3.2.Final.jar:lib/io.fabric8.kubernetes-model-extensions-5.10.1.jar:lib/io.fabric8.kubernetes-model-discovery-5.10.1.jar:lib/io.netty.netty-codec-http2-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-events-5.10.1.jar:lib/org.apache.logging.log4j.log4j-core-2.17.0.jar:lib/org.apache.kafka.kafka-clients-3.0.0.jar:lib/io.fabric8.kubernetes-model-core-5.10.1.jar:lib/jakarta.annotation.jakarta.annotation-api-1.3.5.jar:lib/org.hdrhistogram.HdrHistogram-2.1.11.jar:lib/com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:lib/io.prometheus.simpleclient-0.7.0.jar:lib/io.fabric8.openshift-model-5.10.1.jar:lib/io.fabric8.kubernetes-model-networking-5.10.1.jar:lib/com.squareup.okhttp3.logging-interceptor-3.12.12.jar:lib/org.apache.yetus.audience-annotations-0.5.0.jar:lib/io.fabric8.kubernetes-model-certificates-5.10.1.jar:lib/io.fabric8.kubernetes-model-node-5.10.1.jar:lib/io.strimzi.crd-annotations-0.27.0.jar:lib/com.fasterxml.jackson.core.jackson-annotations-2.11.3.jar:lib/io.fabric8.kubernetes-model-common-5.10.1.jar:lib/io.fabric8.openshift-model-miscellaneous-5.10.1.jar:lib/org.apache.zookeeper.zookeeper-jute-3.6.3.jar:lib/io.netty.netty-handler-4.1.71.Final.jar:lib/io.micrometer.micrometer-core-1.3.1.jar:lib/io.fabric8.kubernetes-model-apiextensions-5.10.1.jar:lib/io.fabric8.kubernetes-client-5.10.1.jar:lib/io.netty.netty-codec-4.1.71.Final.jar:lib/io.fabric8.openshift-model-whereabouts-5.10.1.jar:lib/org.apache.zookeeper.zookeeper-3.6.3.jar:lib/io.fabric8.openshift-model-machineconfig-5.10.1.jar:lib/com.fasterxml.jackson.dataformat.jackson-dataformat-yaml-2.10.5.jar:lib/io.fabric8.kubernetes-model-metrics-5.10.1.jar:lib/io.netty.netty-codec-http-4.1.71.Final.jar:lib/io.fabric8.openshift-model-storageversionmigrator-5.10.1.jar:lib/com.github.mifmif.generex-1.0.2.jar:lib/io.strimzi.operator-common-0.27.0.jar:lib/org.xerial.snappy.snappy-java-1.1.8.1.jar:lib/io.fabric8.kubernetes-model-apps-5.10.1.jar:lib/com.fasterxml.jackson.core.jackson-core-2.11.3.jar:lib/org.yaml.snakeyaml-1.26.jar:lib/org.apache.kafka.kafka-streams-3.0.0.jar:lib/io.fabric8.kubernetes-model-admissionregistration-5.10.1.jar:lib/io.netty.netty-resolver-dns-4.1.71.Final.jar:lib/io.fabric8.openshift-model-monitoring-5.10.1.jar:lib/io.fabric8.kubernetes-model-storageclass-5.10.1.jar:lib/io.apicurio.apicurio-registry-common-1.3.2.Final.jar:lib/org.lz4.lz4-java-1.7.1.jar:lib/io.fabric8.openshift-model-operator-5.10.1.jar:lib/dk.brics.automaton.automaton-1.11-8.jar:lib/io.netty.netty-codec-socks-4.1.71.Final.jar:lib/org.eclipse.microprofile.config.microprofile-config-api-1.4.jar:lib/org.rocksdb.rocksdbjni-6.19.3.jar:lib/io.netty.netty-handler-proxy-4.1.71.Final.jar:lib/io.netty.netty-resolver-4.1.71.Final.jar:lib/io.fabric8.openshift-model-clusterautoscaling-5.10.1.jar:lib/io.strimzi.certificate-manager-0.27.0.jar:lib/io.fabric8.kubernetes-model-flowcontrol-5.10.1.jar:lib/io.fabric8.zjsonpatch-0.3.0.jar:lib/io.fabric8.kubernetes-model-coordination-5.10.1.jar:lib/io.netty.netty-transport-classes-epoll-4.1.71.Final.jar:lib/org.jboss.spec.javax.ws.rs.jboss-jaxrs-api_2.1_spec-2.0.1.Final.jar:lib/org.apache.logging.log4j.log4j-api-2.17.0.jar:lib/io.vertx.vertx-micrometer-metrics-4.2.1.jar:lib/io.fabric8.openshift-model-tuned-5.10.1.jar:lib/com.squareup.okhttp3.okhttp-3.12.6.jar:lib/io.strimzi.api-0.27.0.jar:lib/org.apache.logging.log4j.log4j-slf4j-impl-2.17.0.jar:lib/io.fabric8.kubernetes-model-batch-5.10.1.jar:lib/io.netty.netty-codec-dns-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-autoscaling-5.10.1.jar:lib/io.fabric8.openshift-client-5.10.1.jar:lib/io.fabric8.openshift-model-console-5.10.1.jar:lib/org.slf4j.slf4j-api-1.7.25.jar:lib/io.netty.netty-transport-native-unix-common-4.1.71.Final.jar:lib/io.fabric8.openshift-model-machine-5.10.1.jar:lib/org.latencyutils.LatencyUtils-2.0.3.jar:lib/io.netty.netty-transport-native-epoll-4.1.71.Final.jar:lib/com.github.luben.zstd-jni-1.5.0-2.jar:lib/com.fasterxml.jackson.datatype.jackson-datatype-jsr310-2.13.0.jar:lib/io.netty.netty-tcnative-classes-2.0.46.Final.jar:lib/io.netty.netty-transport-4.1.71.Final.jar:lib/io.vertx.vertx-core-4.2.1.jar:lib/io.micrometer.micrometer-registry-prometheus-1.3.1.jar:lib/io.fabric8.kubernetes-model-rbac-5.10.1.jar:lib/io.fabric8.kubernetes-model-policy-5.10.1.jar
2022-01-03 10:42:59,25070 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2022-01-03 10:42:59,25135 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:java.io.tmpdir=/tmp
2022-01-03 10:42:59,25239 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:java.compiler=<NA>
2022-01-03 10:42:59,25277 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:os.name=Linux
2022-01-03 10:42:59,25384 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:os.arch=amd64
2022-01-03 10:42:59,25447 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:os.version=5.4.144+
2022-01-03 10:42:59,26792 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:user.name=strimzi
2022-01-03 10:42:59,32343 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:user.home=/home/strimzi
2022-01-03 10:42:59,32370 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:user.dir=/opt/strimzi
2022-01-03 10:42:59,32387 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:os.memory.free=12MB
2022-01-03 10:42:59,33242 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:os.memory.max=494MB
2022-01-03 10:42:59,33294 INFO  [vert.x-worker-thread-0] ZooKeeper:98 - Client environment:os.memory.total=31MB
2022-01-03 10:42:59,36345 INFO  [vert.x-worker-thread-0] ZooKeeper:1006 - Initiating client connection, connectString=localhost:2181 sessionTimeout=18000 watcher=org.I0Itec.zkclient.ZkClient@d8fcd8c
2022-01-03 10:42:59,42738 INFO  [vert.x-worker-thread-0] X509Util:77 - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2022-01-03 10:42:59,51692 INFO  [vert.x-worker-thread-0] ClientCnxnSocket:239 - jute.maxbuffer value is 1048575 Bytes
2022-01-03 10:42:59,54565 INFO  [vert.x-worker-thread-0] ClientCnxn:1736 - zookeeper.request.timeout value is 0. feature enabled=false
2022-01-03 10:42:59,56832 INFO  [vert.x-worker-thread-0] ZkClient:936 - Waiting for keeper state SyncConnected
2022-01-03 10:42:59,63048 INFO  [vert.x-worker-thread-0-SendThread(localhost:2181)] ClientCnxn:1181 - Opening socket connection to server localhost/127.0.0.1:2181.
2022-01-03 10:42:59,63259 INFO  [vert.x-worker-thread-0-SendThread(localhost:2181)] ClientCnxn:1183 - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2022-01-03 10:42:59,63689 INFO  [vert.x-worker-thread-0-SendThread(localhost:2181)] ClientCnxn:1013 - Socket connection established, initiating session, client: /127.0.0.1:33230, server: localhost/127.0.0.1:2181
2022-01-03 10:42:59,71923 INFO  [vert.x-worker-thread-0-SendThread(localhost:2181)] ClientCnxn:1448 - Session establishment complete on server localhost/127.0.0.1:2181, session id = 0x300103f8476022c, negotiated timeout = 18000
2022-01-03 10:42:59,72968 INFO  [vert.x-worker-thread-0-EventThread] ZkClient:713 - zookeeper state changed (SyncConnected)
2022-01-03 10:43:00,85028 INFO  [vert.x-eventloop-thread-0] AppInfoParser:119 - Kafka version: 3.0.0
2022-01-03 10:43:00,85309 INFO  [vert.x-eventloop-thread-0] AppInfoParser:120 - Kafka commitId: 8cb0a5e9d3441962
2022-01-03 10:43:00,85943 INFO  [vert.x-eventloop-thread-0] AppInfoParser:121 - Kafka startTimeMs: 1641206580846
2022-01-03 10:43:00,93915 INFO  [vert.x-eventloop-thread-0] KafkaStreamsTopicStoreService:52 - Starting ...
ttys007|thorbenj@Violone:~/Git/ElasticSA/sass-gke/strimzi =)kc logs deployment/elsec-feeds-entity-operator -n kafka -c topic-operator -f
error: timed out waiting for the condition
<< exited on its own here from follow >>

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c topic-operator -f
Error from server (BadRequest): container "topic-operator" in pod "elsec-feeds-entity-operator-5768cfc764-lbp5l" is waiting to start: ContainerCreating

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c topic-operator -f
Preparing trust store certificates for internal communication
Adding /etc/tls-sidecar/cluster-ca-certs/ca.crt to truststore /tmp/topic-operator/replication.truststore.p12 with alias ca
Certificate was added to keystore
Preparing trust store certificates for internal communication is completed
Preparing key store certificates for internal communication
Preparing key store certificates for internal communication is completed
+ exec /usr/bin/tini -w -e 143 -- java -Dlog4j2.configurationFile=file:/opt/topic-operator/custom-config/log4j2.properties -Dvertx.cacheDirBase=/tmp/vertx-cache -Djava.security.egd=file:/dev/./urandom --illegal-access=deny -classpath lib/io.strimzi.topic-operator-0.27.0.jar:lib/io.netty.netty-common-4.1.71.Final.jar:lib/io.prometheus.simpleclient_common-0.7.0.jar:lib/io.apicurio.apicurio-registry-utils-streams-1.3.2.Final.jar:lib/io.netty.netty-buffer-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-scheduling-5.10.1.jar:lib/io.fabric8.openshift-model-operatorhub-5.10.1.jar:lib/com.101tec.zkclient-0.11.jar:lib/com.squareup.okio.okio-1.15.0.jar:lib/io.apicurio.apicurio-registry-utils-kafka-1.3.2.Final.jar:lib/io.fabric8.kubernetes-model-extensions-5.10.1.jar:lib/io.fabric8.kubernetes-model-discovery-5.10.1.jar:lib/io.netty.netty-codec-http2-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-events-5.10.1.jar:lib/org.apache.logging.log4j.log4j-core-2.17.0.jar:lib/org.apache.kafka.kafka-clients-3.0.0.jar:lib/io.fabric8.kubernetes-model-core-5.10.1.jar:lib/jakarta.annotation.jakarta.annotation-api-1.3.5.jar:lib/org.hdrhistogram.HdrHistogram-2.1.11.jar:lib/com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:lib/io.prometheus.simpleclient-0.7.0.jar:lib/io.fabric8.openshift-model-5.10.1.jar:lib/io.fabric8.kubernetes-model-networking-5.10.1.jar:lib/com.squareup.okhttp3.logging-interceptor-3.12.12.jar:lib/org.apache.yetus.audience-annotations-0.5.0.jar:lib/io.fabric8.kubernetes-model-certificates-5.10.1.jar:lib/io.fabric8.kubernetes-model-node-5.10.1.jar:lib/io.strimzi.crd-annotations-0.27.0.jar:lib/com.fasterxml.jackson.core.jackson-annotations-2.11.3.jar:lib/io.fabric8.kubernetes-model-common-5.10.1.jar:lib/io.fabric8.openshift-model-miscellaneous-5.10.1.jar:lib/org.apache.zookeeper.zookeeper-jute-3.6.3.jar:lib/io.netty.netty-handler-4.1.71.Final.jar:lib/io.micrometer.micrometer-core-1.3.1.jar:lib/io.fabric8.kubernetes-model-apiextensions-5.10.1.jar:lib/io.fabric8.kubernetes-client-5.10.1.jar:lib/io.netty.netty-codec-4.1.71.Final.jar:lib/io.fabric8.openshift-model-whereabouts-5.10.1.jar:lib/org.apache.zookeeper.zookeeper-3.6.3.jar:lib/io.fabric8.openshift-model-machineconfig-5.10.1.jar:lib/com.fasterxml.jackson.dataformat.jackson-dataformat-yaml-2.10.5.jar:lib/io.fabric8.kubernetes-model-metrics-5.10.1.jar:lib/io.netty.netty-codec-http-4.1.71.Final.jar:lib/io.fabric8.openshift-model-storageversionmigrator-5.10.1.jar:lib/com.github.mifmif.generex-1.0.2.jar:lib/io.strimzi.operator-common-0.27.0.jar:lib/org.xerial.snappy.snappy-java-1.1.8.1.jar:lib/io.fabric8.kubernetes-model-apps-5.10.1.jar:lib/com.fasterxml.jackson.core.jackson-core-2.11.3.jar:lib/org.yaml.snakeyaml-1.26.jar:lib/org.apache.kafka.kafka-streams-3.0.0.jar:lib/io.fabric8.kubernetes-model-admissionregistration-5.10.1.jar:lib/io.netty.netty-resolver-dns-4.1.71.Final.jar:lib/io.fabric8.openshift-model-monitoring-5.10.1.jar:lib/io.fabric8.kubernetes-model-storageclass-5.10.1.jar:lib/io.apicurio.apicurio-registry-common-1.3.2.Final.jar:lib/org.lz4.lz4-java-1.7.1.jar:lib/io.fabric8.openshift-model-operator-5.10.1.jar:lib/dk.brics.automaton.automaton-1.11-8.jar:lib/io.netty.netty-codec-socks-4.1.71.Final.jar:lib/org.eclipse.microprofile.config.microprofile-config-api-1.4.jar:lib/org.rocksdb.rocksdbjni-6.19.3.jar:lib/io.netty.netty-handler-proxy-4.1.71.Final.jar:lib/io.netty.netty-resolver-4.1.71.Final.jar:lib/io.fabric8.openshift-model-clusterautoscaling-5.10.1.jar:lib/io.strimzi.certificate-manager-0.27.0.jar:lib/io.fabric8.kubernetes-model-flowcontrol-5.10.1.jar:lib/io.fabric8.zjsonpatch-0.3.0.jar:lib/io.fabric8.kubernetes-model-coordination-5.10.1.jar:lib/io.netty.netty-transport-classes-epoll-4.1.71.Final.jar:lib/org.jboss.spec.javax.ws.rs.jboss-jaxrs-api_2.1_spec-2.0.1.Final.jar:lib/org.apache.logging.log4j.log4j-api-2.17.0.jar:lib/io.vertx.vertx-micrometer-metrics-4.2.1.jar:lib/io.fabric8.openshift-model-tuned-5.10.1.jar:lib/com.squareup.okhttp3.okhttp-3.12.6.jar:lib/io.strimzi.api-0.27.0.jar:lib/org.apache.logging.log4j.log4j-slf4j-impl-2.17.0.jar:lib/io.fabric8.kubernetes-model-batch-5.10.1.jar:lib/io.netty.netty-codec-dns-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-autoscaling-5.10.1.jar:lib/io.fabric8.openshift-client-5.10.1.jar:lib/io.fabric8.openshift-model-console-5.10.1.jar:lib/org.slf4j.slf4j-api-1.7.25.jar:lib/io.netty.netty-transport-native-unix-common-4.1.71.Final.jar:lib/io.fabric8.openshift-model-machine-5.10.1.jar:lib/org.latencyutils.LatencyUtils-2.0.3.jar:lib/io.netty.netty-transport-native-epoll-4.1.71.Final.jar:lib/com.github.luben.zstd-jni-1.5.0-2.jar:lib/com.fasterxml.jackson.datatype.jackson-datatype-jsr310-2.13.0.jar:lib/io.netty.netty-tcnative-classes-2.0.46.Final.jar:lib/io.netty.netty-transport-4.1.71.Final.jar:lib/io.vertx.vertx-core-4.2.1.jar:lib/io.micrometer.micrometer-registry-prometheus-1.3.1.jar:lib/io.fabric8.kubernetes-model-rbac-5.10.1.jar:lib/io.fabric8.kubernetes-model-policy-5.10.1.jar io.strimzi.operator.topic.Main
2022-01-03 10:43:41,82063 INFO  [main] Main:29 - TopicOperator 0.27.0 is starting
2022-01-03 10:43:46,71982 INFO  [main] Session:77 - Using config:
	STRIMZI_TRUSTSTORE_LOCATION: /tmp/topic-operator/replication.truststore.p12
	STRIMZI_RESOURCE_LABELS: strimzi.io/cluster=elsec-feeds
	STRIMZI_SASL_MECHANISM: 
	STRIMZI_SASL_USERNAME: 
	STRIMZI_FULL_RECONCILIATION_INTERVAL_MS: 120000
	STRIMZI_CLIENT_ID: strimzi-topic-operator-5c668466-cac0-4224-9ebf-53fbc2aed671
	STRIMZI_SECURITY_PROTOCOL: SSL
	STRIMZI_STALE_RESULT_TIMEOUT_MS: 5000
	STRIMZI_TOPIC_METADATA_MAX_ATTEMPTS: 6
	STRIMZI_KEYSTORE_LOCATION: /tmp/topic-operator/replication.keystore.p12
	STRIMZI_REASSIGN_THROTTLE: 9223372036854775807
	STRIMZI_USE_ZOOKEEPER_TOPIC_STORE: false
	STRIMZI_SASL_PASSWORD: ********
	STRIMZI_KAFKA_BOOTSTRAP_SERVERS: elsec-feeds-kafka-bootstrap:9091
	STRIMZI_SASL_ENABLED: false
	STRIMZI_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: HTTPS
	STRIMZI_NAMESPACE: kafka
	STRIMZI_APPLICATION_ID: __strimzi-topic-operator-kstreams
	STRIMZI_ZOOKEEPER_SESSION_TIMEOUT_MS: 18000
	STRIMZI_TOPICS_PATH: /strimzi/topics
	STRIMZI_ZOOKEEPER_CONNECT: localhost:2181
	STRIMZI_TLS_ENABLED: true
	STRIMZI_KEYSTORE_PASSWORD: ********
	STRIMZI_STORE_NAME: topic-store
	STRIMZI_REASSIGN_VERIFY_INTERVAL_MS: 120000
	TC_ZK_CONNECTION_TIMEOUT_MS: 18000
	STRIMZI_TRUSTSTORE_PASSWORD: ********
	STRIMZI_STORE_TOPIC: __strimzi_store_topic

2022-01-03 10:43:46,76059 INFO  [vert.x-eventloop-thread-0] Session:155 - Starting
<<exited on its own here from follow>>

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c topic-operator -f
unable to retrieve container logs for containerd://20850d4316fcf1a38c3cbfd030d09c8818c9e4cf52475ae75dbe8f699cd75b29

and

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c user-operator -f
Error from server (BadRequest): container "user-operator" in pod "elsec-feeds-entity-operator-7d59599897-5x7pl" is waiting to start: ContainerCreating

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c user-operator -f
+ exec /usr/bin/tini -w -e 143 -- java -Dlog4j2.configurationFile=file:/opt/user-operator/custom-config/log4j2.properties -Dvertx.cacheDirBase=/tmp/vertx-cache -Djava.security.egd=file:/dev/./urandom --illegal-access=deny -classpath lib/io.strimzi.user-operator-0.27.0.jar:lib/io.netty.netty-common-4.1.71.Final.jar:lib/io.prometheus.simpleclient_common-0.7.0.jar:lib/io.netty.netty-buffer-4.1.71.Final.jar:lib/io.fabric8.openshift-model-operatorhub-5.10.1.jar:lib/io.fabric8.kubernetes-model-scheduling-5.10.1.jar:lib/com.squareup.okio.okio-1.15.0.jar:lib/io.fabric8.kubernetes-model-extensions-5.10.1.jar:lib/io.fabric8.kubernetes-model-discovery-5.10.1.jar:lib/io.netty.netty-codec-http2-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-events-5.10.1.jar:lib/org.apache.logging.log4j.log4j-core-2.17.0.jar:lib/org.apache.kafka.kafka-clients-3.0.0.jar:lib/io.fabric8.kubernetes-model-core-5.10.1.jar:lib/com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:lib/io.prometheus.simpleclient-0.7.0.jar:lib/io.fabric8.openshift-model-5.10.1.jar:lib/io.fabric8.kubernetes-model-networking-5.10.1.jar:lib/com.squareup.okhttp3.logging-interceptor-3.12.12.jar:lib/io.fabric8.kubernetes-model-certificates-5.10.1.jar:lib/io.strimzi.crd-annotations-0.27.0.jar:lib/io.fabric8.kubernetes-model-node-5.10.1.jar:lib/org.hdrhistogram.HdrHistogram-2.1.10.jar:lib/com.fasterxml.jackson.core.jackson-annotations-2.11.3.jar:lib/io.fabric8.kubernetes-model-common-5.10.1.jar:lib/io.fabric8.openshift-model-miscellaneous-5.10.1.jar:lib/io.micrometer.micrometer-core-1.3.1.jar:lib/io.netty.netty-handler-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-apiextensions-5.10.1.jar:lib/io.fabric8.openshift-model-whereabouts-5.10.1.jar:lib/io.fabric8.kubernetes-client-5.10.1.jar:lib/io.netty.netty-codec-4.1.71.Final.jar:lib/io.fabric8.openshift-model-machineconfig-5.10.1.jar:lib/com.fasterxml.jackson.dataformat.jackson-dataformat-yaml-2.10.5.jar:lib/io.fabric8.kubernetes-model-metrics-5.10.1.jar:lib/io.netty.netty-codec-http-4.1.71.Final.jar:lib/io.fabric8.openshift-model-storageversionmigrator-5.10.1.jar:lib/com.github.mifmif.generex-1.0.2.jar:lib/io.strimzi.operator-common-0.27.0.jar:lib/org.xerial.snappy.snappy-java-1.1.8.1.jar:lib/com.fasterxml.jackson.core.jackson-core-2.11.3.jar:lib/io.fabric8.kubernetes-model-apps-5.10.1.jar:lib/org.yaml.snakeyaml-1.26.jar:lib/io.fabric8.kubernetes-model-admissionregistration-5.10.1.jar:lib/io.netty.netty-resolver-dns-4.1.71.Final.jar:lib/io.fabric8.openshift-model-monitoring-5.10.1.jar:lib/io.fabric8.kubernetes-model-storageclass-5.10.1.jar:lib/org.lz4.lz4-java-1.7.1.jar:lib/io.fabric8.openshift-model-operator-5.10.1.jar:lib/dk.brics.automaton.automaton-1.11-8.jar:lib/io.netty.netty-codec-socks-4.1.71.Final.jar:lib/io.netty.netty-handler-proxy-4.1.71.Final.jar:lib/io.netty.netty-resolver-4.1.71.Final.jar:lib/io.fabric8.openshift-model-clusterautoscaling-5.10.1.jar:lib/io.strimzi.certificate-manager-0.27.0.jar:lib/io.fabric8.kubernetes-model-flowcontrol-5.10.1.jar:lib/io.fabric8.zjsonpatch-0.3.0.jar:lib/io.fabric8.kubernetes-model-coordination-5.10.1.jar:lib/org.apache.logging.log4j.log4j-api-2.17.0.jar:lib/io.vertx.vertx-micrometer-metrics-4.2.1.jar:lib/io.fabric8.openshift-model-tuned-5.10.1.jar:lib/com.squareup.okhttp3.okhttp-3.12.6.jar:lib/io.strimzi.api-0.27.0.jar:lib/org.apache.logging.log4j.log4j-slf4j-impl-2.17.0.jar:lib/io.fabric8.kubernetes-model-batch-5.10.1.jar:lib/io.netty.netty-codec-dns-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-autoscaling-5.10.1.jar:lib/io.fabric8.openshift-client-5.10.1.jar:lib/io.fabric8.openshift-model-console-5.10.1.jar:lib/io.fabric8.openshift-model-machine-5.10.1.jar:lib/org.latencyutils.LatencyUtils-2.0.3.jar:lib/org.slf4j.slf4j-api-1.7.25.jar:lib/com.github.luben.zstd-jni-1.5.0-2.jar:lib/com.fasterxml.jackson.datatype.jackson-datatype-jsr310-2.13.0.jar:lib/io.netty.netty-tcnative-classes-2.0.46.Final.jar:lib/io.netty.netty-transport-4.1.71.Final.jar:lib/io.vertx.vertx-core-4.2.1.jar:lib/io.micrometer.micrometer-registry-prometheus-1.3.1.jar:lib/io.fabric8.kubernetes-model-rbac-5.10.1.jar:lib/io.fabric8.kubernetes-model-policy-5.10.1.jar io.strimzi.operator.user.Main
2022-01-03 10:48:02 INFO  Main:51 - UserOperator 0.27.0 is starting
2022-01-03 10:48:06 INFO  Util:303 - Using config:
	PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
	container: oci
	STRIMZI_CA_RENEWAL: 30
	ELSEC_FEEDS_ZOOKEEPER_CLIENT_SERVICE_PORT: 2181
	STRIMZI_LABELS: strimzi.io/cluster=elsec-feeds
	JAVA_OPTS:  -Dlog4j2.configurationFile=file:/opt/user-operator/custom-config/log4j2.properties -Dvertx.cacheDirBase=/tmp/vertx-cache -Djava.security.egd=file:/dev/./urandom  --illegal-access=deny
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_SERVICE_PORT_TCP_CLIENTSTLS: 9093
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9092_TCP_ADDR: 10.20.129.150
	STRIMZI_GC_LOG_ENABLED: false
	STRIMZI_EO_KEY_SECRET_NAME: elsec-feeds-entity-operator-certs
	STRIMZI_HOME: /opt/strimzi
	PWD: /opt/strimzi
	KUBERNETES_PORT_443_TCP: tcp://10.20.128.1:443
	JAVA_MAIN: io.strimzi.operator.user.Main
	STRIMZI_SECRET_PREFIX: 
	STRIMZI_VERSION: 0.27.0
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9093_TCP_PROTO: tcp
	STRIMZI_KAFKA_BOOTSTRAP_SERVERS: elsec-feeds-kafka-bootstrap:9091
	STRIMZI_NAMESPACE: kafka
	TINI_SHA256_S390X: 931b70a182af879ca249ae9de87ef68423121b38d235c78997fafc680ceab32d
	STRIMZI_CA_VALIDITY: 365
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_SERVICE_PORT_TCP_REPLICATION: 9091
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9091_TCP_PORT: 9091
	ELSEC_FEEDS_ZOOKEEPER_CLIENT_PORT_2181_TCP: tcp://10.20.130.255:2181
	TINI_SHA256_AMD64: 93dcc18adc78c65a028a84799ecf8ad40c936fdfc5f2a57b1acda5a8117fa82c
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9092_TCP_PORT: 9092
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9091_TCP_ADDR: 10.20.129.150
	KUBERNETES_SERVICE_PORT_HTTPS: 443
	SHLVL: 0
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9092_TCP_PROTO: tcp
	ELSEC_FEEDS_ZOOKEEPER_CLIENT_PORT_2181_TCP_PORT: 2181
	KUBERNETES_PORT: tcp://10.20.128.1:443
	JAVA_HOME: /usr/lib/jvm/jre-11
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9093_TCP: tcp://10.20.129.150:9093
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9093_TCP_ADDR: 10.20.129.150
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_SERVICE_PORT: 9091
	STRIMZI_FULL_RECONCILIATION_INTERVAL_MS: 120000
	KUBERNETES_SERVICE_HOST: 10.20.128.1
	JAVA_CLASSPATH: lib/io.strimzi.user-operator-0.27.0.jar:lib/io.netty.netty-common-4.1.71.Final.jar:lib/io.prometheus.simpleclient_common-0.7.0.jar:lib/io.netty.netty-buffer-4.1.71.Final.jar:lib/io.fabric8.openshift-model-operatorhub-5.10.1.jar:lib/io.fabric8.kubernetes-model-scheduling-5.10.1.jar:lib/com.squareup.okio.okio-1.15.0.jar:lib/io.fabric8.kubernetes-model-extensions-5.10.1.jar:lib/io.fabric8.kubernetes-model-discovery-5.10.1.jar:lib/io.netty.netty-codec-http2-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-events-5.10.1.jar:lib/org.apache.logging.log4j.log4j-core-2.17.0.jar:lib/org.apache.kafka.kafka-clients-3.0.0.jar:lib/io.fabric8.kubernetes-model-core-5.10.1.jar:lib/com.fasterxml.jackson.core.jackson-databind-2.11.3.jar:lib/io.prometheus.simpleclient-0.7.0.jar:lib/io.fabric8.openshift-model-5.10.1.jar:lib/io.fabric8.kubernetes-model-networking-5.10.1.jar:lib/com.squareup.okhttp3.logging-interceptor-3.12.12.jar:lib/io.fabric8.kubernetes-model-certificates-5.10.1.jar:lib/io.strimzi.crd-annotations-0.27.0.jar:lib/io.fabric8.kubernetes-model-node-5.10.1.jar:lib/org.hdrhistogram.HdrHistogram-2.1.10.jar:lib/com.fasterxml.jackson.core.jackson-annotations-2.11.3.jar:lib/io.fabric8.kubernetes-model-common-5.10.1.jar:lib/io.fabric8.openshift-model-miscellaneous-5.10.1.jar:lib/io.micrometer.micrometer-core-1.3.1.jar:lib/io.netty.netty-handler-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-apiextensions-5.10.1.jar:lib/io.fabric8.openshift-model-whereabouts-5.10.1.jar:lib/io.fabric8.kubernetes-client-5.10.1.jar:lib/io.netty.netty-codec-4.1.71.Final.jar:lib/io.fabric8.openshift-model-machineconfig-5.10.1.jar:lib/com.fasterxml.jackson.dataformat.jackson-dataformat-yaml-2.10.5.jar:lib/io.fabric8.kubernetes-model-metrics-5.10.1.jar:lib/io.netty.netty-codec-http-4.1.71.Final.jar:lib/io.fabric8.openshift-model-storageversionmigrator-5.10.1.jar:lib/com.github.mifmif.generex-1.0.2.jar:lib/io.strimzi.operator-common-0.27.0.jar:lib/org.xerial.snappy.snappy-java-1.1.8.1.jar:lib/com.fasterxml.jackson.core.jackson-core-2.11.3.jar:lib/io.fabric8.kubernetes-model-apps-5.10.1.jar:lib/org.yaml.snakeyaml-1.26.jar:lib/io.fabric8.kubernetes-model-admissionregistration-5.10.1.jar:lib/io.netty.netty-resolver-dns-4.1.71.Final.jar:lib/io.fabric8.openshift-model-monitoring-5.10.1.jar:lib/io.fabric8.kubernetes-model-storageclass-5.10.1.jar:lib/org.lz4.lz4-java-1.7.1.jar:lib/io.fabric8.openshift-model-operator-5.10.1.jar:lib/dk.brics.automaton.automaton-1.11-8.jar:lib/io.netty.netty-codec-socks-4.1.71.Final.jar:lib/io.netty.netty-handler-proxy-4.1.71.Final.jar:lib/io.netty.netty-resolver-4.1.71.Final.jar:lib/io.fabric8.openshift-model-clusterautoscaling-5.10.1.jar:lib/io.strimzi.certificate-manager-0.27.0.jar:lib/io.fabric8.kubernetes-model-flowcontrol-5.10.1.jar:lib/io.fabric8.zjsonpatch-0.3.0.jar:lib/io.fabric8.kubernetes-model-coordination-5.10.1.jar:lib/org.apache.logging.log4j.log4j-api-2.17.0.jar:lib/io.vertx.vertx-micrometer-metrics-4.2.1.jar:lib/io.fabric8.openshift-model-tuned-5.10.1.jar:lib/com.squareup.okhttp3.okhttp-3.12.6.jar:lib/io.strimzi.api-0.27.0.jar:lib/org.apache.logging.log4j.log4j-slf4j-impl-2.17.0.jar:lib/io.fabric8.kubernetes-model-batch-5.10.1.jar:lib/io.netty.netty-codec-dns-4.1.71.Final.jar:lib/io.fabric8.kubernetes-model-autoscaling-5.10.1.jar:lib/io.fabric8.openshift-client-5.10.1.jar:lib/io.fabric8.openshift-model-console-5.10.1.jar:lib/io.fabric8.openshift-model-machine-5.10.1.jar:lib/org.latencyutils.LatencyUtils-2.0.3.jar:lib/org.slf4j.slf4j-api-1.7.25.jar:lib/com.github.luben.zstd-jni-1.5.0-2.jar:lib/com.fasterxml.jackson.datatype.jackson-datatype-jsr310-2.13.0.jar:lib/io.netty.netty-tcnative-classes-2.0.46.Final.jar:lib/io.netty.netty-transport-4.1.71.Final.jar:lib/io.vertx.vertx-core-4.2.1.jar:lib/io.micrometer.micrometer-registry-prometheus-1.3.1.jar:lib/io.fabric8.kubernetes-model-rbac-5.10.1.jar:lib/io.fabric8.kubernetes-model-policy-5.10.1.jar
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9093_TCP_PORT: 9093
	ELSEC_FEEDS_ZOOKEEPER_CLIENT_SERVICE_PORT_TCP_CLIENTS: 2181
	STRIMZI_CLUSTER_CA_CERT_SECRET_NAME: elsec-feeds-cluster-ca-cert
	TINI_VERSION: v0.19.0
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_SERVICE_HOST: 10.20.129.150
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9091_TCP: tcp://10.20.129.150:9091
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_SERVICE_PORT_TCP_CLIENTS: 9092
	KUBERNETES_PORT_443_TCP_ADDR: 10.20.128.1
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9091_TCP_PROTO: tcp
	KUBERNETES_PORT_443_TCP_PROTO: tcp
	STRIMZI_CA_CERT_NAME: elsec-feeds-clients-ca-cert
	STRIMZI_CA_NAMESPACE: kafka
	KUBERNETES_SERVICE_PORT: 443
	TINI_SHA256_ARM64: 07952557df20bfd2a95f9bef198b445e006171969499a1d361bd9e6f8e5e0e81
	ELSEC_FEEDS_ZOOKEEPER_CLIENT_PORT: tcp://10.20.130.255:2181
	TINI_SHA256_PPC64LE: 3f658420974768e40810001a038c29d003728c5fe86da211cff5059e48cfdfde
	ELSEC_FEEDS_ZOOKEEPER_CLIENT_SERVICE_HOST: 10.20.130.255
	HOSTNAME: elsec-feeds-entity-operator-7d59599897-5x7pl
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT_9092_TCP: tcp://10.20.129.150:9092
	ELSEC_FEEDS_ZOOKEEPER_CLIENT_PORT_2181_TCP_ADDR: 10.20.130.255
	STRIMZI_CA_KEY_NAME: elsec-feeds-clients-ca
	ELSEC_FEEDS_ZOOKEEPER_CLIENT_PORT_2181_TCP_PROTO: tcp
	KUBERNETES_PORT_443_TCP_PORT: 443
	HOME: /home/strimzi
	ELSEC_FEEDS_KAFKA_BOOTSTRAP_PORT: tcp://10.20.129.150:9091
	MALLOC_ARENA_MAX: 2
	STRIMZI_ACLS_ADMIN_API_SUPPORTED: false

2022-01-03 10:48:09 INFO  AdminClientConfig:376 - AdminClientConfig values: 
	bootstrap.servers = [elsec-feeds-kafka-bootstrap:9091]
	client.dns.lookup = use_all_dns_ips
	client.id = 
	connections.max.idle.ms = 300000
	default.api.timeout.ms = 40000
	metadata.max.age.ms = 30000
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	receive.buffer.bytes = 65536
	reconnect.backoff.max.ms = 1000
	reconnect.backoff.ms = 50
	request.timeout.ms = 10000
	retries = 3
	retry.backoff.ms = 100
	sasl.client.callback.handler.class = null
	sasl.jaas.config = null
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.login.callback.handler.class = null
	sasl.login.class = null
	sasl.login.refresh.buffer.seconds = 300
	sasl.login.refresh.min.period.seconds = 60
	sasl.login.refresh.window.factor = 0.8
	sasl.login.refresh.window.jitter = 0.05
	sasl.mechanism = GSSAPI
	security.protocol = SSL
	security.providers = null
	send.buffer.bytes = 131072
	socket.connection.setup.timeout.max.ms = 30000
	socket.connection.setup.timeout.ms = 10000
	ssl.cipher.suites = null
	ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
	ssl.endpoint.identification.algorithm = https
	ssl.engine.factory.class = null
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.certificate.chain = [hidden]
	ssl.keystore.key = [hidden]
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = PEM
	ssl.protocol = TLSv1.3
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.certificates = [hidden]
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = PEM
<< exited on its own here from follow >>

$ kc logs deployment/elsec-feeds-entity-operator -n kafka -c user-operator -f
unable to retrieve container logs for containerd://7a07622229139bbda3af64f14e0be918c15c32f900480da5347b582a2a4df53e

3 replies

scholzj Jan 3, 2022
Maintainer

The logs are good. But they do not show anything suspicious. It just seems something terminates / deletes the pod after what seems like a very short time. Do Kubernetes logs / events say what or why did it?

ThorbenJ Jan 3, 2022
Author

It appears to be the Strimzi operator:

2022-01-03 14:52:48 DEBUG Reflector:125 - Event received ADDED Pod resourceVersion 11114059
2022-01-03 14:52:48 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-f4cb7f554-pc6b7 in namespace kafka was ADDED
2022-01-03 14:52:48 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-f4cb7f554-pc6b7 in namespace kafka which was ADDED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:48 DEBUG Reflector:125 - Event received MODIFIED Pod resourceVersion 11114061
2022-01-03 14:52:48 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-f4cb7f554-pc6b7 in namespace kafka was MODIFIED
2022-01-03 14:52:48 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-f4cb7f554-pc6b7 in namespace kafka which was MODIFIED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:49 DEBUG Reflector:125 - Event received MODIFIED Pod resourceVersion 11114073
2022-01-03 14:52:49 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-f4cb7f554-pc6b7 in namespace kafka was MODIFIED
2022-01-03 14:52:49 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-f4cb7f554-pc6b7 in namespace kafka which was MODIFIED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:49 DEBUG Reflector:125 - Event received DELETED Pod resourceVersion 11114074
2022-01-03 14:52:49 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-f4cb7f554-pc6b7 in namespace kafka was DELETED
2022-01-03 14:52:49 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-f4cb7f554-pc6b7 in namespace kafka which was DELETED does not seem to be controlled by any StrimziPodSet and will be ignored
c2022-01-03 14:52:50 DEBUG Reflector:125 - Event received ADDED Pod resourceVersion 11114097
2022-01-03 14:52:50 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-7b79f87fc7-2htqt in namespace kafka was ADDED
2022-01-03 14:52:50 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-7b79f87fc7-2htqt in namespace kafka which was ADDED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:50 DEBUG Reflector:125 - Event received MODIFIED Pod resourceVersion 11114099
2022-01-03 14:52:50 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-7b79f87fc7-2htqt in namespace kafka was MODIFIED
2022-01-03 14:52:50 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-7b79f87fc7-2htqt in namespace kafka which was MODIFIED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:51 DEBUG Reflector:125 - Event received MODIFIED Pod resourceVersion 11114113
2022-01-03 14:52:51 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-7b79f87fc7-2htqt in namespace kafka was MODIFIED
2022-01-03 14:52:51 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-7b79f87fc7-2htqt in namespace kafka which was MODIFIED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:51 DEBUG Reflector:125 - Event received DELETED Pod resourceVersion 11114114
2022-01-03 14:52:51 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-7b79f87fc7-2htqt in namespace kafka was DELETED
2022-01-03 14:52:51 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-7b79f87fc7-2htqt in namespace kafka which was DELETED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:52 DEBUG Reflector:125 - Event received ADDED Pod resourceVersion 11114130
2022-01-03 14:52:52 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-64bf6d5984-5g75p in namespace kafka was ADDED
2022-01-03 14:52:52 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-64bf6d5984-5g75p in namespace kafka which was ADDED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:52 DEBUG Reflector:125 - Event received MODIFIED Pod resourceVersion 11114132
2022-01-03 14:52:52 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-64bf6d5984-5g75p in namespace kafka was MODIFIED
2022-01-03 14:52:52 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-64bf6d5984-5g75p in namespace kafka which was MODIFIED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:53 INFO  AbstractOperator:373 - Reconciliation #28(watch) Kafka(kafka/elsec-feeds): Reconciliation is in progress
2022-01-03 14:52:53 DEBUG Reflector:125 - Event received MODIFIED Pod resourceVersion 11114141
2022-01-03 14:52:53 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-64bf6d5984-5g75p in namespace kafka was MODIFIED
2022-01-03 14:52:53 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-64bf6d5984-5g75p in namespace kafka which was MODIFIED does not seem to be controlled by any StrimziPodSet and will be ignored
2022-01-03 14:52:53 DEBUG Reflector:125 - Event received DELETED Pod resourceVersion 11114144
2022-01-03 14:52:53 DEBUG StrimziPodSetController:165 - Pod elsec-feeds-entity-operator-64bf6d5984-5g75p in namespace kafka was DELETED
2022-01-03 14:52:53 DEBUG StrimziPodSetController:182 - Pod elsec-feeds-entity-operator-64bf6d5984-5g75p in namespace kafka which was DELETED does not seem to be controlled by any StrimziPodSet and will be ignored

I am trying to find out what is annoying it - if I am able..

scholzj Jan 3, 2022
Maintainer

Nothing in this log suggests it is the cluster operator what deletes it. These are just received events which are ignored. But they confirm that someone / something deletes the pod.

ThorbenJ · 2022-01-03T15:01:34Z

ThorbenJ
Jan 3, 2022
Author

Could it be this annotation?

- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"topic-operator"},{"name":"user-operator"},{"name":"tls-sidecar"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"topic-operator"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"user-operator"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"tls-sidecar"}]},"modified":true}'
      deployment.kubernetes.io/revision: "382"
    creationTimestamp: "2022-01-03T14:43:52Z"

16 replies

scholzj Jan 4, 2022
Maintainer

Well, i suspect the probes do not matter. Based on the logs, the pod is deleted before it has the chance to get ready. Maybe you also want to use your command with the Deployment and not directly with the Replica Set as at the end, if you do it with the ReplicaSet, the Deployment will be yet another party to interfere with it.

WilliamDenniss Jan 11, 2022

Yes, that could be some kind of hint I guess. Scaling the Deployment to 0 would delete the pod. So that is consistent with what you are seeing. But I guess the question still remains who scales it down -> I guess it is some GKE component which things it is saving resources? But without knowledge of what Google does there, it is hard to say what makes it scale down the entity operator and not the cluster operator for example.

It's not Autopilot that's scaling down the Deployment. The division of responsibilities for Autopilot is that the Pods and Deployments are yours (the operators') to manage, while the Nodes are managed by Autopilot. Autopilot will scale Nodes to zero if you don't have any Pods running, but that's not what's happening here.

It's possible that from time to time individual Pods may be terminated, for example due to node upgrade events (which is why we'd always recommend using a Deployment or other higher level workload construct), but that's not the issue here.

WilliamDenniss Jan 11, 2022

Based on the logs, the pod is deleted before it has the chance to get ready.

I have seen this kind of problem before (with Knative). One difference with Autopilot, is that it is not unusual for it to take 80-90s for a Pod to move from the Pending to ContainerCreating status. What happens in that time is that Autopilot is provisioning the node to serve the Pod.

If an Operator expects Pods to move out of the Pending status in a short amount of time, this could be a problem. In that case, the Operator is a bit overzealous.

WilliamDenniss Jan 11, 2022

Could it be this annotation?

 autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"topic-operator"},{"name":"user-operator"},{"name":"tls-sidecar"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"topic-operator"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"user-operator"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"tls-sidecar"}]},"modified":true}'

Unlikely. This annotation is simply Autopilot recording a mutation that happened. It indicates that the pod resources were modified to fit within Autopilot's constraints. I wouldn't expect this to cause any problems.

Autopilot does two things here:

increases resources if needed, to meet requirements
sets limits equal to requests ("guaranteed" QoS class)

It's possible if a Pod requires more resources than it requests (i.e. relies on bursting) that it may not work well on Autopilot. In which case, the resources need to be increased. This would actually be a user bug on any Kubernetes cluster, but if may not be obvious if there is always spare capacity on the node.

scholzj Jan 11, 2022
Maintainer

If an Operator expects Pods to move out of the Pending status in a short amount of time, this could be a problem. In that case, the Operator is a bit overzealous.

Strimzi will by default wait for 5 minutes before calling it an error. But even if it calls it an error, it will not delete the pod, it will keep waiting in next reconciliation. Since we do not manage the pod directly. We manage only the deployment. So I do not think this is the operator being overzealous here.

ThorbenJ · 2022-01-04T15:06:48Z

ThorbenJ
Jan 4, 2022
Author

The health checks on the two operators have:

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthy
            port: healthcheck
            scheme: HTTP
...
        name: user-operator
...
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: healthcheck
            scheme: HTTP

However in GKE these are being interpreted as:

I doubt you wanted port 0, but rather localhost:80 no? There does not seem to be a config option to change these to localhost:80/ready (or localhost:80/healthy)

1 reply

scholzj Jan 4, 2022
Maintainer

That looks like a UI issue in whatever UI you are using. They refer to the port named healthcheck:

port: healthcheck

which if you check the full deployment YAML refers to the ports 8080 respectively 8081 .

ThorbenJ · 2022-01-06T13:28:58Z

ThorbenJ
Jan 6, 2022
Author

So I forwarded this discussion thread to Google Cloud support to get their assistance with the EntityOperator failing to deploy.

This is their response:

I’ve been discussing with the specialist and we arrived at some conclusions and found some interesting logs that could hint us to a possible cause of this issue.

We’ve encountered some logs at container level for the tls-sidecar for the workload experiencing the issue.

With this insertID [1] do you have a terminated event. With this one [2] we can see a java exception of a Connection refused [3], and lastly with this insertID 4 we can see some exec errors 5.

Thus our conclusion is that this set up is conflicting with some restriction/limitation of the Autopilot Clusters, as we don’t know how the strimzi, kafka and the zookeeper are behaving we can’t really tell which is exactly the part of this set up conflicting with which limitation.

Our guesses are that the container isolation [6] could be the reason here as it is possible that the containers may be using CAP_NET_RAW permission or that Linux is doing some actions that are not supported by Autopilot [7].

One step in order to be out of doubt is asking the strimzi support team if some of these limitations [8] can be conflicting with their set up.

The other suggestion that I’ll kindly tell you is to test this set up with a Standard GKE cluster in order to see if there are any issues or similar issues that you’re encountering with the Autopilot set up.

Let me know if this information is useful for you and don’t hesitate to ask any more questions regarding this matter.

Have a nice day!

Kind Regards,
Google Cloud Platform Support
—--------------------------------------
[1]
insertId: "xboz3kkf8oc32qpj"
Terminated Event log
[2]:
insertId: "yq911s8sr80yn1px"
[3]:
textPayload: "java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) ~[?:?]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[org.apache.zookeeper.zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) [org.apache.zookeeper.zookeeper-3.6.3.jar:3.6.3]"
timestamp: "2022-01-05T16:16:50.858786985Z"

textPayload: "+ exec /usr/bin/tini -w -e 143 -- /usr/bin/stunnel /tmp/stunnel.conf"
[6]:
https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#container_isolation
[7]:
https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#linux_workload_limitations
[8]:
https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#introduction

Autopilot Container Isolation

[https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#container_isolation]

Autopilot enforces a hardened configuration for your Pods that provides enhanced security isolation and helps limit the impact of container escape vulnerabilities on your cluster:

The container runtime default seccomp profile is applied, by default, to all Pods in your cluster.

The CAP_NET_RAW container permission is dropped for all containers. The CAP_NET_RAW permission is not typically used and was the subject of multiple container escape vulnerabilities. The lack of CAP_NET_RAW might cause the use of ping to fail inside your container.

Workload Identity is enforced and prevents Pod access to the underlying Compute Engine service account and other sensitive node metadata.

Services with spec.ExternalIPs set are blocked to protect against CVE-2020-8554. These services are rarely used.
-The following StorageTypes are allowed. Other StorageTypes are blocked because they require privileges over the node: "configMap", "csi", "downwardAPI", "emptyDir", "gcePersistentDisk", "hostPath",
"nfs", "persistentVolumeClaim", "projected", "secret"

I think we already see this with the template changes I had to make for the zk and kf sets to stop the delete loop.

Is the tls-sidecar container running foul of any other this?
e.g. requiring CAP_NET_RAW:-

CAP_NET_RAW: Any kind of packet can be forged, which includes faking senders, sending malformed packets, etc., this also allows to bind to any address (associated to the ability to fake a sender this allows to impersonate a device, legitimately used for "transparent proxying" as per the manpage but from an attacker point-of-view this term is a synonym for Man-in-The-Middle).
[https://unix.stackexchange.com/questions/447886/what-does-cap-net-raw-do]

I tried to search for 'stunnel NET_PCAP_RAW' and could not find anything conclusive. Other than some patches to allow certain modes of operations without requiring root.

Side question: Why is the tls-sidecar needed when (AFAIK) both Zookeeper and Kafka now support TLS natively? (I know that wasn't true some years ago).

Linux Capability Limitations

[https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#linux_workload_limitations]

Autopilot supports only the following Linux capabilities for workloads:
"SETPCAP", "MKNOD", "AUDIT_WRITE", "CHOWN", "DAC_OVERRIDE", "FOWNER",
"FSETID", "KILL", "SETGID", "SETUID", "NET_BIND_SERVICE", "SYS_CHROOT", "SETFCAP"
In GKE version 1.21 and later, the "SYS_PTRACE" capability is also supported for workloads.

Does tls-sidecar require other capabilities?

Ignoring GKE Autopilot; It seems there are many security issues and CVEs (e.g. CVE-2020-12401) associated with CAP_NEW_RAW in the container workload space. If Strimzi (tls-sidecar) does in fact require such, that might be something worth reviewing in the interest of security solely.

I am aware that in a complex system such as the components that make up strimzi and the workloads it manages, that these two questions may not be immediately answerable. However I wanted to make this post for posterity.

I do plan to try and find time (sometime) to try Strimzi on GKE Standard (maybe even toggle CAP_NET_RAW); not as a long term solution but rather just as a validation test.
The whole point of trying strimzi on gke autopilot was to be able to spend time on other things, and to avoid managing Kafka, Zookeeper, and the infra that they were to run on.

5 replies

scholzj Jan 6, 2022
Maintainer

Strimzi works fine when all capabilities are dropped. So AFAIK it does not care about the CAP_NET_RAW capability and doesn't need it. So I do not think that is the issue. I'm not sure what are the insertId things they are talking about. But I think that the logs are ok => they just show start and shutdown which given these things are independent (and startup / shutdown can happen asynchronously) can cause temporary errors such as [3].

I think none of this explains the Pod DELETE events / scaling down of the Deployment. If for any of the reasons mentioned there would the entity operator not work, it would not be deleting the pod but restarting the containers.

ThorbenJ Jan 6, 2022
Author

I wasn't convinced either, but it greats to have your thoughts on this too.
insertId is the log event Id when looking in Google Operations (their Logs and Monitoring tool).

ThorbenJ Jan 7, 2022
Author

So I deployed to a new GKE Standard cluster using all the same as above. All worked first time no issue.
Such a shame, I like the concept of a PaaS / provider-managed k8s cluster that only charged you for the pods you ran and not your nodes. I will try to get google to look into why it didn't work. I have feeling it's not Autopilot itself, but rather other GKE features that Autopilot turns on.
Thank you for all the time and support - If I have an update I'll post it here.

WilliamDenniss Jan 11, 2022

Autopilot isn't the one scaling down the Deployment. Autopilot's job here is to add and remove nodes in response to Pods that are added/removed by the user. It can mutate deployments to modify resource requests, or enforce requirements (like no privileged pods), but it doesn't change the replica count.

Since it works on GKE Standard, my guess would be that there is some failure (possibly due to an incompatibility or odd interaction with Autopilot) that the operator is reacting to by scaling the deployment. I'd focus the investigating on root-causing why the scale-down is initiated in the first place.

scholzj Jan 11, 2022
Maintainer

Well, I do not have access to the cluster. But scaling replicas might mean scaling deployment. It might also mean modifying the Deployment and scaling down one replicaset to scale up another replicaset. I guess @ThorbenJ would need to check that.

ThorbenJ · 2022-01-13T15:19:15Z

ThorbenJ
Jan 13, 2022
Author

FYI: https://issuetracker.google.com/issues/214356345 - Created by google in response to this case.

0 replies

moritz89 · 2023-09-25T07:41:55Z

moritz89
Sep 25, 2023

This seems to have been solved. Documented my experience in #6922

2 replies

ThorbenJ Oct 27, 2023
Author

This is specifically about GKE Autopilot, if am not mistaken your last comment in #6922 is that you (like me) reverted to using GKE Standard. So is this really solved in GKE Autopilot?

moritz89 Oct 27, 2023

Nope, it hasn't been solved. Or better said, the crash behaves differently currently than it did when I initially tested it ~1 year ago. I think back then the Kafka brokers themselves were crashing and now its the operator

Issues trying Strimzi on GKE-auto #6111

Uh oh!

Replies: 12 comments · 30 replies

Uh oh!

scholzj Dec 27, 2021 Maintainer

Uh oh!

ThorbenJ Dec 28, 2021 Author

Uh oh!

ThorbenJ Dec 28, 2021 Author

Uh oh!

scholzj Dec 28, 2021 Maintainer

Uh oh!

ThorbenJ Dec 28, 2021 Author

Uh oh!

scholzj Dec 28, 2021 Maintainer

Uh oh!

ThorbenJ Dec 31, 2021 Author

Uh oh!

ThorbenJ Dec 31, 2021 Author

Uh oh!

scholzj Dec 31, 2021 Maintainer

Uh oh!

ThorbenJ Jan 3, 2022 Author

Uh oh!

scholzj Jan 3, 2022 Maintainer

Uh oh!

ThorbenJ Jan 3, 2022 Author

Uh oh!

scholzj Jan 3, 2022 Maintainer

Uh oh!

ThorbenJ Jan 3, 2022 Author

Uh oh!

scholzj Jan 4, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

scholzj Jan 11, 2022 Maintainer

Uh oh!

ThorbenJ Jan 4, 2022 Author

Uh oh!

scholzj Jan 4, 2022 Maintainer

Uh oh!

Uh oh!

ThorbenJ Jan 6, 2022 Author

Autopilot Container Isolation

Linux Capability Limitations

Uh oh!

scholzj Jan 6, 2022 Maintainer

Uh oh!

ThorbenJ Jan 6, 2022 Author

Uh oh!

ThorbenJ Jan 7, 2022 Author

Uh oh!

Uh oh!

scholzj Jan 11, 2022 Maintainer

Uh oh!

ThorbenJ Jan 13, 2022 Author

Uh oh!

Uh oh!

ThorbenJ Oct 27, 2023 Author

Uh oh!

Replies: 12 comments 30 replies

scholzj
Dec 27, 2021
Maintainer

ThorbenJ
Dec 28, 2021
Author

ThorbenJ
Dec 28, 2021
Author

scholzj Dec 28, 2021
Maintainer

ThorbenJ
Dec 28, 2021
Author

scholzj Dec 28, 2021
Maintainer

ThorbenJ
Dec 31, 2021
Author

ThorbenJ
Dec 31, 2021
Author

scholzj Dec 31, 2021
Maintainer

ThorbenJ
Jan 3, 2022
Author

scholzj Jan 3, 2022
Maintainer

ThorbenJ Jan 3, 2022
Author

scholzj Jan 3, 2022
Maintainer

ThorbenJ
Jan 3, 2022
Author

scholzj Jan 4, 2022
Maintainer

scholzj Jan 11, 2022
Maintainer

ThorbenJ
Jan 4, 2022
Author

scholzj Jan 4, 2022
Maintainer

ThorbenJ
Jan 6, 2022
Author

scholzj Jan 6, 2022
Maintainer

ThorbenJ Jan 6, 2022
Author

ThorbenJ Jan 7, 2022
Author

scholzj Jan 11, 2022
Maintainer

ThorbenJ
Jan 13, 2022
Author

ThorbenJ Oct 27, 2023
Author