Skip to content

Commit 3627b5a

Browse files
Merge pull request #4909 from jingyih/multiclusterlease_md
doc: add a GEMINI.md file for the multi-cluster leader election project
2 parents cc2e478 + 565cc6f commit 3627b5a

File tree

1 file changed

+43
-0
lines changed

1 file changed

+43
-0
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
Multi-Cluster Leader Election
2+
3+
## 1. Overview
4+
5+
This library provides a robust, decentralized, and Kubernetes-native leader election mechanism that allows a single replica of a controller to be elected as a leader from a pool of candidates running across multiple Kubernetes clusters.
6+
7+
## 2. Core Principles & Tenets
8+
9+
This design adheres to the following non-negotiable principles:
10+
11+
* **Client-Side Cloud Agnosticism:** Client controllers (e.g., KCC) that consume this library should not need to compile in any cloud-provider-specific SDKs or dependencies. The integration library should be cloud-neutral.
12+
* **Decentralized Control Plane:** The election logic will be managed by a controller running in each participating cluster, eliminating a central control plane as a single point of failure.
13+
* **Seamless Integration:** The solution must be easily consumable by any controller built with `client-go` and `controller-runtime`, leveraging the standard `resourcelock.Interface` for a native-feeling integration.
14+
15+
## 3. High-Level Architecture
16+
17+
The system consists of three primary components:
18+
19+
* **Client-Side Library (`MultiClusterLeaseLock`):** A lightweight, cloud-agnostic Go package that client controllers import. It implements the standard `resourcelock.Interface`.
20+
* **Decentralized Election Controller:** A controller that runs in each participating cluster. It is the only component that interacts with the global lock backend.
21+
* **Global Lock Backend:** An external, highly-available storage system that supports atomic compare-and-swap operations (e.g., a dedicated etcd cluster, GCS, DynamoDB).
22+
23+
The interaction flow is as follows:
24+
25+
1. A Client Controller replica uses the Client-Side Library to create/update a `MultiClusterLease` CR in its local cluster. This serves as its candidacy declaration and liveness heartbeat.
26+
2. The Decentralized Election Controller in that same cluster observes this local CR.
27+
3. The Election Controller then contends for a lock on the Global Lock Backend on behalf of its local candidate.
28+
4. Based on the outcome of the global contention, the Election Controller updates the status of the local `MultiClusterLease` CR.
29+
5. The Client Controller learns it has become the leader by observing the change in the status of its local CR.
30+
31+
## 4. API Contract: MultiClusterLease CRD
32+
33+
The `MultiClusterLease` CRD is the central API contract. It cleanly separates the concerns between the client candidate and the election controller.
34+
35+
* **`spec`** (Written by Client Controller): Represents the desired state of a candidate.
36+
* `holderIdentity` (string): The unique ID of the candidate pod.
37+
* `leaseDurationSeconds` (int): The duration the lease is considered valid.
38+
* `renewTime` (`metav1.MicroTime`): The timestamp of the last heartbeat from the candidate. This is the primary liveness signal.
39+
40+
* **`status`** (Written by Election Controller): Represents the observed state of the global election. This field is protected by the `/status` subresource, making it read-only for the client controller.
41+
* `leader` (string): The `holderIdentity` of the confirmed global leader.
42+
* `acquireTime` (`metav1.MicroTime`): Timestamp of when the global lock was acquired.
43+
* `conditions` (`metav1.Condition`): Standard Kubernetes conditions for detailed, machine-readable status.

0 commit comments

Comments
 (0)