-
Notifications
You must be signed in to change notification settings - Fork 30
[BUG] Secret object gets mistakenly deleted if operator reads stale cluster.Spec.TLS.Enabled #35
Description
Describe the bug
After restarting from a crash, the operator can mistakenly delete the secret objects if it reads stale state of cluster.Spec.TLS.Enabled.
Consider the following situation, there are two apiservers, apiserver1 and apiserver2, and the operator initially is communicating with apiserver1. The field cluster.Spec.TLS.Enabled is initially set to false, and then changed to true by the user. The operator reconciles and creates the Secret object accordingly. After the Secret object is created, the operator crashes, restarts, and starts to communicate with apiserver2. The apiserver2 is stale and still holds the cluster.Spec.TLS.Enabled field as false at the moment. The operator cannot tell whether the data is stale or not so it directly deletes the Secret object.
To Reproduce
Steps to reproduce the behavior:
- Create YBCluster with
cluster.Spec.TLS.Enabledset to false. - Change
cluster.Spec.TLS.Enabledto true. Operator will reconcile and create the Secret objects. Meanwhile, apiserver2 is straggling and still holdscluster.Spec.TLS.Enabledas false. - Operator crashes, restarts, and communicates with apiserver2. It then reconciles and deletes the Secret objects since
cluster.Spec.TLS.Enabledis false on apiserver2.
Fix
We are willing to send a PR to fix this problem.
A potential fix is to use the Secret object's UID on deletion (precondition). If the Secret object is stale, etcd will tell that the UID is invalid and prevent the deletion.