|
| 1 | +Multi-Cluster Leader Election |
| 2 | + |
| 3 | +## 1. Overview |
| 4 | + |
| 5 | +This library provides a robust, decentralized, and Kubernetes-native leader election mechanism that allows a single replica of a controller to be elected as a leader from a pool of candidates running across multiple Kubernetes clusters. |
| 6 | + |
| 7 | +## 2. Core Principles & Tenets |
| 8 | + |
| 9 | +This design adheres to the following non-negotiable principles: |
| 10 | + |
| 11 | +* **Client-Side Cloud Agnosticism:** Client controllers (e.g., KCC) that consume this library should not need to compile in any cloud-provider-specific SDKs or dependencies. The integration library should be cloud-neutral. |
| 12 | +* **Decentralized Control Plane:** The election logic will be managed by a controller running in each participating cluster, eliminating a central control plane as a single point of failure. |
| 13 | +* **Seamless Integration:** The solution must be easily consumable by any controller built with `client-go` and `controller-runtime`, leveraging the standard `resourcelock.Interface` for a native-feeling integration. |
| 14 | + |
| 15 | +## 3. High-Level Architecture |
| 16 | + |
| 17 | +The system consists of three primary components: |
| 18 | + |
| 19 | +* **Client-Side Library (`MultiClusterLeaseLock`):** A lightweight, cloud-agnostic Go package that client controllers import. It implements the standard `resourcelock.Interface`. |
| 20 | +* **Decentralized Election Controller:** A controller that runs in each participating cluster. It is the only component that interacts with the global lock backend. |
| 21 | +* **Global Lock Backend:** An external, highly-available storage system that supports atomic compare-and-swap operations (e.g., a dedicated etcd cluster, GCS, DynamoDB). |
| 22 | + |
| 23 | +The interaction flow is as follows: |
| 24 | + |
| 25 | +1. A Client Controller replica uses the Client-Side Library to create/update a `MultiClusterLease` CR in its local cluster. This serves as its candidacy declaration and liveness heartbeat. |
| 26 | +2. The Decentralized Election Controller in that same cluster observes this local CR. |
| 27 | +3. The Election Controller then contends for a lock on the Global Lock Backend on behalf of its local candidate. |
| 28 | +4. Based on the outcome of the global contention, the Election Controller updates the status of the local `MultiClusterLease` CR. |
| 29 | +5. The Client Controller learns it has become the leader by observing the change in the status of its local CR. |
| 30 | + |
| 31 | +## 4. API Contract: MultiClusterLease CRD |
| 32 | + |
| 33 | +The `MultiClusterLease` CRD is the central API contract. It cleanly separates the concerns between the client candidate and the election controller. |
| 34 | + |
| 35 | +* **`spec`** (Written by Client Controller): Represents the desired state of a candidate. |
| 36 | + * `holderIdentity` (string): The unique ID of the candidate pod. |
| 37 | + * `leaseDurationSeconds` (int): The duration the lease is considered valid. |
| 38 | + * `renewTime` (`metav1.MicroTime`): The timestamp of the last heartbeat from the candidate. This is the primary liveness signal. |
| 39 | + |
| 40 | +* **`status`** (Written by Election Controller): Represents the observed state of the global election. This field is protected by the `/status` subresource, making it read-only for the client controller. |
| 41 | + * `leader` (string): The `holderIdentity` of the confirmed global leader. |
| 42 | + * `acquireTime` (`metav1.MicroTime`): Timestamp of when the global lock was acquired. |
| 43 | + * `conditions` (`metav1.Condition`): Standard Kubernetes conditions for detailed, machine-readable status. |
0 commit comments