Upgrading from 0.28 to 0.32 #7895
Replies: 15 comments 26 replies
-
I guess you are missing the RBAC for the Lease resource for leader election. But as I do not use Helm, no idea why. |
Beta Was this translation helpful? Give feedback.
-
Interestingly I can see this is been done as well, I can see this in GKE. roleRef:
|
Beta Was this translation helpful? Give feedback.
-
I go it, name space is wrong |
Beta Was this translation helpful? Give feedback.
-
@scholzj - in one of the KAKFA cluster I can see this ERROR and upgrade fails - with ZOOKEEPER failing to come back up 2023-01-29 13:28:14.771 GMT Sceanrio- RUN kafka manifest update Strimzi log says above thing repeatedly. ZK-0 vanishes and ZK-1 and ZK-2 goes on a NPE error with unable to contact ZK-0 . any idea what is happening? I am updating from 0.28 to .32 |
Beta Was this translation helpful? Give feedback.
-
@scholzj I will uplaod all the log isnide the kafka namespace as soon as I get compliance approval to upload the log. mean time I have a q for those cluster which I could upgrade succesfully I wanted a clarification. for 0.28 I can see KAKFA and ZK running a statefuset and then inside they are managing three POD (as per my replicaiton need) But on 0.32 I am not able to see any stateful set, but I can see three KAKFA POD and ZK running. Is this behaviour correct? I thought as per what I saw in one of your work item discussion you are not removing statefulset till 0.35. hence thought to clarify $ kubectl get statefulset -n dbca-kafka $ kubectl get pods -n dbca-kafka Version of PODS |
Beta Was this translation helpful? Give feedback.
-
@scholzj finally I for complaince approval to uplaod log, its been uploded as a CSV> please filter on COLOUMN "resource.labels.container_name" to see log from differenct component. Sorry I could get any other better ways to extract logs. as the POD was recreted afterfailure to upgrade. and taking this log from GKE log than K8s log. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@scholzj I did one more try today where I upgrade operator 0.32 , then uninstalled kafka 3.3.1 which is not supported on 0.32. then tried an upgrade. the kakfa SVC are created on 240.xxx range on 9092, but there are no POD associated with that SVC. Attaching the log. again its on CSV, but all I have is Operator log with CONNECITON error during reconcile |
Beta Was this translation helpful? Give feedback.
-
I waited for almost 40min. NO POD started other than just the service on the namespace |
Beta Was this translation helpful? Give feedback.
-
this is only two ERROR I can see on K8s gcp-controller-manager failed to be retrieved and certificatesigningrequests failed to be read |
Beta Was this translation helpful? Give feedback.
-
attaching the debug log as requestd, my setup GKE - private cluster 1.23.14-gke.1800 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Noz at my desk right now. So I cannot share the link. But looking for
UseStrimziPodSet feature gate in the docs you will find iit.
That said, StatefulSets will not be supported forever, so it is just a
temporary workaround.
…On Thu, Feb 9, 2023, 11:59 sree ***@***.***> wrote:
Understood. I can see two webhook and the POD creation is ofcourse timing
out. You in past said we can enable STAEFULSET instead of STRIMZIPODSET if
we change a config, will you be kind enbough to point me to that page
please?
I am trying to see if those two webhook is causing the issue. One webhook
is associated with NEWRELIC and one with SPARK-OPERATOR> my money is on
SPARK OPERATOR cauing some funny stuff and POD creation is failing.
I am doing multiple installs , tought if I know how to change to
stateulset, I can try to see if that goes through or not.
—
Reply to this email directly, view it on GitHub
<#7895 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLFORZSDS6SLH4U3ZFMM63WWTEYFANCNFSM6AAAAAATYHK2O4>
.
You are receiving this because you were mentioned.Message ID:
<strimzi/strimzi-kafka-operator/repo-discussions/7895/comments/4917498@
github.com>
|
Beta Was this translation helpful? Give feedback.
-
@scholzj We use GKE and we use NewRelic workload for logging and monitoring. ( so as to coexist between on-prem and cloud. ) issue is when STRIMZI or any workload request to create a POD we have a mutating WEBHOOK which get triggered which add some env specific variable to new POD created. in my case whist installing Strimzi 0.32 with StrimziPodSet this is where the connection timeout is happening. And this mutating webhook is only failing when we install as StrimziPodSet and Not when I install as just normal or K8S STATEFULSET by overriding the operator/crd/manifest using UseStrimziPodSet . We cannot promote 0.32 to UAT or PRD, unless we know what is causing the webhook to timeout whist using the default UseStrimziPodSet, to debug further will it be possible for me to see the code where you are creating UseStrimziPodSet how its different to normal k8s statefulset? I may struggle to understand but will get google support help if needed. Not sure if there is any restriction in StrimziPodSet - that WEBHOOK is not allowed or something in that line is what Google also inferring? Any help would be much appreciated |
Beta Was this translation helpful? Give feedback.
-
To supporting only one version of kafka 3.3.1 in kafka-versions.yaml file , do we need to remove another version like 3.2.0,3.2.13.2.3? Actually when i am removing so that time getting error. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi I have a working cluster which run on 0.28 and I am trying to upgrade to 0.32 ( and kafka 3.3.1).
I am using HELM based install - and Operator upgrade has gone through well. and I can see the operator running in 0.32.0
but I can see this error on Operator log
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://240.2.134.193/apis/coordination.k8s.io/v1/namespaces/kafka/leases/strimzi-cluster-operator. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. leases.coordination.k8s.io "strimzi-cluster-operator" is forbidden: User "system:serviceaccount:kafka:strimzi-cluster-operator" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kafka".
and
textPayload: "2023-01-11 16:12:41 ERROR LeaderElector:156 - Exception occurred while acquiring lock 'LeaseLock: kafka - strimzi-cluster-operator (strimzi-cluster-operator-6b9bddd689-4lssq)'"
timestamp: "2023-01-11T16:12:41.068330167Z"
I checked the SA (strimzi-cluster-operator) is present with role binding done as below
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: strimzi-cluster-operator-namespaced
subjects:
name: strimzi-cluster-operator
namespace: myproject
clusterROle strimzi-cluster-operator-namespaced is also present which I checked and it seems to be matching with what is there in your repo (which was expected anyways).
what am I missing ? As far the document goes 0.28 -> 0.32 should be straright fwd upgrade isnt?
Beta Was this translation helpful? Give feedback.
All reactions