You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Consideration for Stale Priorities](#consideration-for-stale-priorities)
79
90
-[Enabling on a component](#enabling-on-a-component)
80
91
-[Migrations](#migrations)
81
92
-[API](#api)
@@ -440,6 +451,90 @@ set of candidates and selected strategy is the same as before.
440
451
The obvious drawback is the need for a consensus protocol and extra information
441
452
in the `LeaseCandidate` object that may be unnecessary.
442
453
454
+
### Priority-based Coordinated Leader Election
455
+
To enhance control over leader assignment beyond existing CLE strategies like OldestEmulatedVersion, we propose adding an optional `Priority` field unset by default, higher value = higher priority) to `LeaseCandidateSpec`.
456
+
457
+
This field allows operators to explicitly designate a preferred leader.
458
+
The CLE system will select the candidate with the highest non-zero Priority. If multiple candidates share the same highest priority, the existing v1.CoordinatedLeaseStrategy will act as a tie-breaker. If no candidates have a priority set, the system defaults to the existing v1.CoordinatedLeaseStrategy.
459
+
460
+
This provides granular, temporary control without replacing the primary CLE mechanism.
461
+
462
+
#### LeaseCandidateSpec Update
463
+
A new field called `Priority` is included into `LeaseCandidateSpec`:
464
+
```go
465
+
// LeaseCandidateSpec is a specification of a Lease.
466
+
type LeaseCandidateSpec struct {
467
+
// ...
468
+
Priority int32 `json:"priority,omitempty" protobuf:"varint,7,opt,name=priority"` // New field: Higher value means higher priority. The value must be > 0.
469
+
}
470
+
```
471
+
472
+
#### Behavior of the Priority Field
473
+
- Priority Value: The `Priority` field is an int32. A higher numerical value indicates a higher priority. This field must be greater than 0.
474
+
- Selection Logic:
475
+
- If one or more candidates have a Priority > 0: The candidate with the numerically highest Priority value will be selected as the leader.
476
+
- Tie-Breaking for Equal Highest Priority: If multiple candidates share the same highest non-zero Priority value, the selection among these equally prioritized candidates will be resolved using their existing `v1.CoordinatedLeaseStrategy` (e.g., OldestEmulatedVersion).
477
+
- If no candidates have a Priority, the leader selection will proceed based purely on the existing `v1.CoordinatedLeaseStrategy`.
478
+
479
+
#### Scenario Breakdown for priority based coordination leader election
480
+
Here is a step-by-step breakdown of the scenarios for better understanding the priority-based leader election during upgrades.
481
+
482
+
##### 1. Initial State
483
+
At the beginning, all components (C1, C2, and C3) are running Binary Version 1 and are emulating Version 1
484
+
485
+
| Component | Binary Version | Emulation Version | Leader |
Should an issue arise with C1 requiring a rollback, we can unset its priority. This will enable CLE to select C2, which contains the oldest emulated version.
520
+
521
+
| Component | Binary Version | Emulation Version | Priority | Leader |
Unless the cluster administrator resets the priority, C1 will always remain the leader. When a component gets upgraded or downgraded, it may create a new release candidate, causing the priority to reset.
529
+
530
+
#### Consideration for Stale Priorities
531
+
A concern with the priority field is the potential for "stale priorities" – a priority set temporarily and not subsequently cleared. This could prevent the Coordinated Leader Election (CLE) system from selecting a more appropriate leader.
532
+
We considered exposing a Time-To-Live (TTL) for priority in the `LeaseCandidateSpec`, where the CLE system would ignore a priority once its TTL expired. While this directly addresses the "temporary" nature of many priority assignments, we've decided not to include it in this initial phase due to several complexities:
533
+
- Implementation and Semantics: Defining the precise data type and behavior for a TTL (e.g., time.Duration vs. time.Time, resetting logic) adds significant complexity.
534
+
- User Rationalization: Adding a third field (ttl) to an already multi-faceted leader election logic (strategy + priority) greatly increases the cognitive load for users to understand and manage leader selection effectively.
535
+
536
+
Therefore, in this initial iteration, managing priority lifecycles will be an operational responsibility, requiring manual clearance or updates. We may revisit TTL or similar automated mechanisms in future iterations after gaining more experience with the priority field.
537
+
443
538
### Enabling on a component
444
539
445
540
Components with a `--leader-elect-resource-lock` flag (kube-controller-manager,
@@ -865,6 +960,7 @@ in back-to-back releases.
865
960
866
961
- Load test Coordinated Leader Election
867
962
- Feature is enabled by default
963
+
- A tested solution for stale priorities is implemented, working through either improved user validation to prevent them, or an automated system to correct them.
0 commit comments