Reconciliation error after pods crashed [ Could not roll pod - due to Error getting broker config ] #7485
-
The case2 brokers are crashing due to disk issue "No space left on device", when I extended the PVC size, the operator fails to resize the filesystem of the brokers
logs of a crashed pod
Manually extended the PVC size, but the operator fails to restart the pods to resize the file system Mounted By: strimzi-kafka-cluster-oci-preprod-kafka-1
Conditions:
Type Status LastProbeTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
FileSystemResizePending True Mon, 01 Jan 0001 00:00:00 +0000 Sat, 15 Oct 2022 00:51:38 +0200 Waiting for user to (re-)start a pod to finish file system resize of volume on node.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Resizing 6m36s (x1754 over 2d4h) external-resizer blockvolume.csi.oraclecloud.com External resizer is resizing volume csi-c652ce27-1381-4e5a-acdf-06d14ae65024
Normal FileSystemResizeRequired 6m35s (x1752 over 2d4h) external-resizer blockvolume.csi.oraclecloud.com Require file system resize of volume on node operator logs
To Reproduce
Expected behavior Environment (please complete the following information):
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
Strimzi does not resize the disks. Strimzi only changes the requested size in the PVC and waits. And the rest is basically up to your infrastructure. On most platforms, TBH, I have no idea how Oracle OKE works and what it needs. So you will need to find out what it actually needs to resize the filesystem and do it. |
Beta Was this translation helpful? Give feedback.
-
worked by modifying the PVC resources manually, but not by modifying the volume size in the |
Beta Was this translation helpful? Give feedback.
-
we didn't face the same issue on AWS, that confirms that it's an Oracle cloud issue |
Beta Was this translation helpful? Give feedback.
-
We've just run into something similar on GCP using Strimzi 0.29.0 where our Kafka cluster had run out of space. I believe the ability to resize PVCs via the Kafka CR was added in 0.12, so I would've expected the CO to reconcile the PVC storage requests fairly quickly, regardless of the state of the Kafka pods. I'd like to understand whether this is expected or not. I can confirm that we have volume expansion enabled, as I was able to resize the PVCs myself by editing the requests. CO Logs
|
Beta Was this translation helpful? Give feedback.
Strimzi does not resize the disks. Strimzi only changes the requested size in the PVC and waits. And the rest is basically up to your infrastructure. On most platforms,
FileSystemResizeRequired
means that a restart of the pod is required to resize the filesystem (the filesystem resize is not done by Strimzi but by the platform). So Strimzi rolls the pods. But what is required actually depends on your platform and simple restart is not always enough.TBH, I have no idea how Oracle OKE works and what it needs. So you will need to find out what it actually needs to resize the filesystem and do it.