Skip to content

[BUG] Secret object gets mistakenly deleted if operator reads stale cluster.Spec.TLS.Enabled #35

@srteam2020

Description

@srteam2020

Describe the bug

After restarting from a crash, the operator can mistakenly delete the secret objects if it reads stale state of cluster.Spec.TLS.Enabled.

Consider the following situation, there are two apiservers, apiserver1 and apiserver2, and the operator initially is communicating with apiserver1. The field cluster.Spec.TLS.Enabled is initially set to false, and then changed to true by the user. The operator reconciles and creates the Secret object accordingly. After the Secret object is created, the operator crashes, restarts, and starts to communicate with apiserver2. The apiserver2 is stale and still holds the cluster.Spec.TLS.Enabled field as false at the moment. The operator cannot tell whether the data is stale or not so it directly deletes the Secret object.

To Reproduce

Steps to reproduce the behavior:

  1. Create YBCluster with cluster.Spec.TLS.Enabled set to false.
  2. Change cluster.Spec.TLS.Enabled to true. Operator will reconcile and create the Secret objects. Meanwhile, apiserver2 is straggling and still holds cluster.Spec.TLS.Enabled as false.
  3. Operator crashes, restarts, and communicates with apiserver2. It then reconciles and deletes the Secret objects since cluster.Spec.TLS.Enabled is false on apiserver2.

Fix

We are willing to send a PR to fix this problem.
A potential fix is to use the Secret object's UID on deletion (precondition). If the Secret object is stale, etcd will tell that the UID is invalid and prevent the deletion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions