Kafka broker fails with OutOfMemoryError but container keeps running #6581
Unanswered
jakub-klapka
asked this question in
Q&A
Replies: 1 comment 7 replies
-
I have never seen this kind of error. I'm not sure Strimzi can do much about how Kafka is handling OoM exceptions. That is on Kafka. One thing I noticed from your StatefulSet, you do not have any resources configured. I think it would make sense to configure that to make sure the pods have a set amount of available memory. Not sure how much helps with this issue, but it should make it at least more clear what is hapepning. |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
Hello, one of our kafka broker fails on OOM (which could be due to insufficient memory of course), but the container did not fail. Actually, even though kafka was not working, it's process was still present and even liveness probe was still returning success. So at the end, k8s had no idea, that kafka was down and did not restarted the instance.
I recall, that k8s reported usage 80% of memory limit.
I have tail of broker log:
Since container was still running, I was able to shell into it and try
/opt/kafka/kafka_liveness.sh
which did return 0.Here is output of ps:
And netstat:
To Reproduce
Unfortunately, I don't know yet, how to reproduce that in consistent manner. This happened once during our production use and after manual restart of that container, it didn't happen again.
My hopes are, that you can see some "operation path", how this can happen. I don't have much knowledge about java runtime and it's memory management practices.
Expected behavior
We probably can solve the OOM issue by increasing
KAFKA_HEAP_OPTS: -Xms128M
, but I think, that in the described case, container should somehow fail, either by quitting their processes or returning non-0 from liveness probe.Environment (please complete the following information):
YAML files and logs
Yaml of broker STS, so you can see startup params and memory limits:
Thank you, Jakub
Beta Was this translation helpful? Give feedback.
All reactions