-
Notifications
You must be signed in to change notification settings - Fork 11
Description
I'm running into a very strange and yet reliably reproducible issue on my local dev system. Somehow, when running demo/setup.sh, kubectl reports podmonitor.monitoring.coreos.com/pg-eu created but the PodMonitor didn't actually exist in the cluster after script completion. There appears to be a timing-related bug or race condition where the first kubectl apply for PodMonitors during the setup script execution reports success (HTTP 201 Created from the API server) but the resource doesn't persist. This was a silent failure that only occurred during script execution - manual application after the script completed worked perfectly.
The exact root cause is unclear. Through fairly extensive debugging, PodMonitors might have disappeared during the kubectl wait command that waits for the PostgreSQL cluster to be ready. The exact reason for this bizarre behavior is unknown, but it could be related to:
- A race condition or timing issue with kubectl operations or other cluster operations
- Possible caching or context switching issues in kubectl or the API server
- Some interaction between multiple kubectl commands in rapid succession
- Some form of resource cleanup happening immediately after creation
I was not able to fully root cause what is happening in kubernetes after quite a bit of debugging (I didn't get to the point of tracing kubernetes internals), but I found a straightforward workaround that seems to address it - moving podmonitor creation to the end and giving the cluster a few seconds to stabilize before creating podmonitors. I'll put together a PR for this - hope it's ok.