KafkaConnect Pending State/CrashLoopBackOffs #8608
-
Hi, I'm having issues running KafkaConnect with Strimzi. Please refer to below timeline - ISSUE 1: Frequent Restarts at POD Level of KafkaConnect ISSUE2: Because of Increased Replicas started seeing CrashLoopBackOffs / OOMKilled.. etc. ISSUE3:
FIX: Restarted the Strimzi Operator, and then Kafka Cluster ISSUE4: Now Kafka Clusters are in Pending State from last 12hours Kafka Connect Config with Strimzi -
Kubernetes Events -
Error Logs -
|
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment · 14 replies
-
Please format the YAML as a code to make it readable. You will also need to share the full logs (at least from the Strimzi Cluster operator & the Connect pods) - without them, nobody can tell what is going on in your cluster. If you have pods in the pending state, then you should probably also share the Kubernetes events which will explain why it is pending - but in general, that is most often caused by infrastructure/environment issues. You should probably also pick up one state and focus on it not 4 different ones. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Reduced the replicas to 1 from 3, deleted cluster deployments and re-installed them via apply -f
Events -
Logs -
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Just got my pods running, wuhoo. Maybe I was giving a lot before so reducing it to reasonable range did the work? or probably adding another few nodes? |
Beta Was this translation helpful? Give feedback.
All reactions
-
Just sharing with this thread what's happening in realtime, so I can update folks here more on the issue - The restart journey of pods has started again, and I want to understand why it happens, hence how to control it?
Events -
Operator Logs -
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Your Pod is pending because you don't have resources for it: Setting a different memory request and limit is also quite a bad practice for something like Java. You should ideally use the same value for both. You did not share any logs from the Kafka Connect pod when it is running. So it is not clear what is going on there. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Hello @scholzj thanks for looking into this. Below are the logs from a POD that kept on restarting -
|
Beta Was this translation helpful? Give feedback.
All reactions
-
I do not see any shutdown in this log. Is it really complete? |
Beta Was this translation helpful? Give feedback.
All reactions
-
@scholzj
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Current state of pod
Current Events
Operator logs
Lots of reconciliation errors in operator logs or websocket issues |
Beta Was this translation helpful? Give feedback.
Your Pod is pending because you don't have resources for it:
0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
. You need to make sure your cluster has enough resources to accommodate the Pod which does not seem to be the case.Setting a different memory request and limit is also quite a bad practice for something like Java. You should ideally use the same value for both.
You did not share any logs from the Kafka Connect pod when it is running. So it is not clear what is going on there.