Skip to content

Commit 82e8543

Browse files
authored
Merge pull request #4814 from Jefftree/cle-update
KEP-4355: Update Coordinated Leader Election KEP with details after API Review
2 parents 5235f58 + 6d79005 commit 82e8543

File tree

1 file changed

+79
-51
lines changed
  • keps/sig-api-machinery/4355-coordinated-leader-election

1 file changed

+79
-51
lines changed

keps/sig-api-machinery/4355-coordinated-leader-election/README.md

Lines changed: 79 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -263,19 +263,18 @@ See the API section for the full API.
263263
apiVersion: coordination.k8s.io/v1
264264
kind: LeaseCandidate
265265
metadata:
266-
labels:
267-
binary-version: "1.29"
268-
compatibility-version: "1.29"
269266
name: some-custom-controller-0001A
270267
namespace: kube-system
271268
spec:
272-
canLeadLease: some-custom-controller
269+
leaseName: some-custom-controller
270+
binary-version: "1.29"
271+
compatibility-version: "1.29"
273272
leaseDurationSeconds: 300
274273
renewTime: "2023-12-05T02:33:08.685777Z"
275274
```
276275
277276
A component "lease candidate" announces candidacy for leadership by specifying
278-
`spec.canLeadLease` in its lease candidate lease. If the LeaseCandidate object expires, the
277+
`spec.leaseName` in its lease candidate lease. If the LeaseCandidate object expires, the
279278
component is considered unavailable for leader election purposes. "Expires" is defined more clearly in the Renewal Interval section.
280279

281280
### Coordinated Election Controller
@@ -289,10 +288,8 @@ Coordinated Election Controller reconciliation loop:
289288
- If no leader lease exists for a components:
290289
- Elect leader from candidates by preparing a freshly renewed `Lease` with:
291290
- `spec.holderIdentity` set to the identity of the elected leader
292-
- `coordination.k8s.io/elected-by: leader-election-controller` (to make
293-
lease types easy to disambiguate)
294291
- If there is a better candidate than current leader:
295-
- Sets `endofterm: true` on the leader `Lease`, signaling
292+
- Sets `preferredHolder` on the leader `Lease` to the name of the next leader, signaling
296293
that the leader should stop renewing the lease and yield leadership
297294

298295
```mermaid
@@ -311,7 +308,6 @@ apiVersion: coordination.k8s.io/v1
311308
kind: Lease
312309
metadata:
313310
annotations:
314-
coordination.k8s.io/elected-by: coordinated-election-controller
315311
name: some-custom-controller
316312
namespace: kube-system
317313
spec:
@@ -335,29 +331,30 @@ option.
335331

336332
### Coordinated Lease Lock
337333

338-
A new `resourceLock` type of `coordinatedleases`, and `CoordinatedLeaseLock`
339-
implementation of `resourcelock.Interface` will be added to client-go that:
334+
A new controller `tools/leaderelection/leasecandidate` will be added to client-go that:
340335

341336
- Creates LeaseCandidate Lease when ready to be Leader
342337
- Renews LeaseCandidate lease infrequently (once every 300 seconds)
343-
- Watches its LeaseCandidate lease for the `coordination.k8s.io/pending-ack` annotation and updates to remove it. When the annotation is removed, the `renewTime` is subsequently updated.
344-
338+
- Watches its LeaseCandidate lease for the updates to the `pingTime` field. If
339+
the `pingTime` field is later than `renewTime`, it signals that the
340+
`LeaseCandidate` should be renewed and the `renewTime` is subsequently
341+
updated.
345342
- Watches Leader Lease, waiting to be elected leader by the Coordinated Election
346343
Controller
347344
- When it becomes leader:
348345
- Perform role of active component instance
349346
- Renew leader lease periodically
350-
- Stop renewing if lease is marked `spec.endOfTerm: true`
347+
- Stop renewing if lease field `spec.preferredHolder` is non nil
351348
- If leader lease expires:
352-
- Shutdown (yielding leadership) and restart as a candidate component instance
349+
- Yield leadership and return to acting as a candidate component instance. For certain components, this may involve shutting down and restarting.
353350

354351
```mermaid
355352
flowchart TD
356353
A[Started] -->|Create LeaseCandidate Lease| B
357354
B[Candidate] --> |Elected| C[Leader]
358355
C --> |Renew Leader Lease| C
359-
C -->|End of Term / Leader Lease Expired| D[Shutdown]
360-
D[Shutdown] -.-> |Restart| A
356+
C -->|Better Candidate Available / Leader Lease Expired| D[Yield Leadership]
357+
D[Yield Leadership] -.-> |Shutdown/Restart if necessary| A
361358
```
362359

363360
### Renewal Interval and Performance
@@ -366,10 +363,12 @@ The leader lease will have renewal interval and duration (2s and 15s). This is s
366363
For component leases, keeping a short renewal interval will add many unnecessary writes to the apiserver.
367364
The component leases renewal interval will default to 5 mins.
368365

369-
When the leader lease is marked as end of term or available, the coordinated leader election controller will
370-
add an annotation to all component lease candidate objects (`coordination.k8s.io/pending-ack`) and wait up to 5 seconds.
371-
During that time, components must update their component lease to remove the annotation.
372-
The leader election controller will then pick the leader based on its criteria from the set of component leases that have ack'd the request.
366+
When the leader lease is marked as end of term or available, the coordinated
367+
leader election controller will update the `pingTime` field of all component
368+
lease candidate objects and wait up to 5 seconds. During that time, components
369+
will update their component lease `renewTime`. The leader election controller
370+
will then pick the leader based on its criteria from the set of component leases
371+
that have ack'd the request.
373372

374373
### Strategy
375374

@@ -484,27 +483,18 @@ type CoordinatedLeaseStrategy string
484483
485484
// CoordinatedLeaseStrategy defines the strategy for picking the leader for coordinated leader election.
486485
const (
487-
OldestCompatibilityVersion CoordinatedStrategy = "OldestCompatibilityVersion"
488-
NoCoordination CoordinatedStrategy = "NoCoordination"
486+
OldestEmulationVersion CoordinatedLeaseStrategy = "OldestEmulationVersion"
489487
)
490488
489+
// LeaseSpec is a specification of a Lease.
491490
type LeaseSpec struct {
492-
// Strategy indicates the strategy for picking the leader for coordinated leader election
493-
// This is filled in from LeaseCandidate.Spec.Strategy or defaulted to NoCoordinationStrategy
494-
// if the leader was not elected by the CLE controller.
495-
Strategy CoordinatedLeaseStrategy `json:"strategy,omitempty" protobuf:"string,6,opt,name=strategy"`
496-
497-
// EndofTerm signals to a lease holder that the lease should not be
498-
// renewed because a better candidate is available.
499-
EndOfTerm bool `json:"endOfTerm,omitempty" protobuf:"boolean,7,opt,name=endOfTerm"`
500-
501-
// EXISTING FIELDS BELOW
502-
503491
// holderIdentity contains the identity of the holder of a current lease.
492+
// If Coordinated Leader Election is used, the holder identity must be
493+
// equal to the elected LeaseCandidate.metadata.name field.
504494
// +optional
505495
HolderIdentity *string `json:"holderIdentity,omitempty" protobuf:"bytes,1,opt,name=holderIdentity"`
506496
// leaseDurationSeconds is a duration that candidates for a lease need
507-
// to wait to force acquire it. This is measure against time of last
497+
// to wait to force acquire it. This is measured against the time of last
508498
// observed renewTime.
509499
// +optional
510500
LeaseDurationSeconds *int32 `json:"leaseDurationSeconds,omitempty" protobuf:"varint,2,opt,name=leaseDurationSeconds"`
@@ -519,29 +509,67 @@ type LeaseSpec struct {
519509
// holders.
520510
// +optional
521511
LeaseTransitions *int32 `json:"leaseTransitions,omitempty" protobuf:"varint,5,opt,name=leaseTransitions"`
512+
// Strategy indicates the strategy for picking the leader for coordinated leader election.
513+
// If the field is not specified, there is no active coordination for this lease.
514+
// (Alpha) Using this field requires the CoordinatedLeaderElection feature gate to be enabled.
515+
// +featureGate=CoordinatedLeaderElection
516+
// +optional
517+
Strategy *CoordinatedLeaseStrategy `json:"strategy,omitempty" protobuf:"bytes,6,opt,name=strategy"`
518+
// PreferredHolder signals to a lease holder that the lease has a
519+
// more optimal holder and should be given up.
520+
// This field can only be set if Strategy is also set.
521+
// +featureGate=CoordinatedLeaderElection
522+
// +optional
523+
PreferredHolder *string `json:"preferredHolder,omitempty" protobuf:"bytes,7,opt,name=preferredHolder"`
522524
}
523525
```
524526

525527
For the LeaseCandidate leases, a new lease will be created
526528

527529
```go
530+
// LeaseCandidateSpec is a specification of a Lease.
528531
type LeaseCandidateSpec struct {
529-
// The fields BinaryVersion and CompatibilityVersion will be mandatory labels instead of fields in the spec
530-
531-
// CanLeadLease indicates the name of the lease that the candidate may lead
532-
CanLeadLease string
533-
534-
// FIELDS DUPLICATED FROM LEASE
535-
536-
// leaseDurationSeconds is a duration that candidates for a lease need
537-
// to wait to force acquire it. This is measure against time of last
538-
// observed renewTime.
532+
// LeaseName is the name of the lease for which this candidate is contending.
533+
// This field is immutable.
534+
// +required
535+
LeaseName string `json:"leaseName" protobuf:"bytes,1,name=leaseName"`
536+
// PingTime is the last time that the server has requested the LeaseCandidate
537+
// to renew. It is only done during leader election to check if any
538+
// LeaseCandidates have become ineligible. When PingTime is updated, the
539+
// LeaseCandidate will respond by updating RenewTime.
539540
// +optional
540-
LeaseDurationSeconds *int32 `json:"leaseDurationSeconds,omitempty" protobuf:"varint,2,opt,name=leaseDurationSeconds"`
541-
// renewTime is a time when the current holder of a lease has last
542-
// updated the lease.
541+
PingTime *metav1.MicroTime `json:"pingTime,omitempty" protobuf:"bytes,2,opt,name=pingTime"`
542+
// RenewTime is the time that the LeaseCandidate was last updated.
543+
// Any time a Lease needs to do leader election, the PingTime field
544+
// is updated to signal to the LeaseCandidate that they should update
545+
// the RenewTime.
546+
// Old LeaseCandidate objects are also garbage collected if it has been hours
547+
// since the last renew. The PingTime field is updated regularly to prevent
548+
// garbage collection for still active LeaseCandidates.
543549
// +optional
544-
RenewTime *metav1.MicroTime `json:"renewTime,omitempty" protobuf:"bytes,4,opt,name=renewTime"`
550+
RenewTime *metav1.MicroTime `json:"renewTime,omitempty" protobuf:"bytes,3,opt,name=renewTime"`
551+
// BinaryVersion is the binary version. It must be in a semver format without leading `v`.
552+
// This field is required when strategy is "OldestEmulationVersion"
553+
// +optional
554+
BinaryVersion string `json:"binaryVersion,omitempty" protobuf:"bytes,4,opt,name=binaryVersion"`
555+
// EmulationVersion is the emulation version. It must be in a semver format without leading `v`.
556+
// EmulationVersion must be less than or equal to BinaryVersion.
557+
// This field is required when strategy is "OldestEmulationVersion"
558+
// +optional
559+
EmulationVersion string `json:"emulationVersion,omitempty" protobuf:"bytes,5,opt,name=emulationVersion"`
560+
// PreferredStrategies indicates the list of strategies for picking the leader for coordinated leader election.
561+
// The list is ordered, and the first strategy supersedes all other strategies. The list is used by coordinated
562+
// leader election to make a decision about the final election strategy. This follows as
563+
// - If all clients have strategy X as the first element in this list, strategy X will be used.
564+
// - If a candidate has strategy [X] and another candidate has strategy [Y, X], Y supersedes X and strategy Y
565+
// will be used.
566+
// - If a candidate has strategy [X, Y] and another candidate has strategy [Y, X], this is a user error and leader
567+
// election will not operate the Lease until resolved.
568+
// (Alpha) Using this field requires the CoordinatedLeaderElection feature gate to be enabled.
569+
// +featureGate=CoordinatedLeaderElection
570+
// +listType=atomic
571+
// +required
572+
PreferredStrategies []v1.CoordinatedLeaseStrategy `json:"preferredStrategies,omitempty" protobuf:"bytes,6,opt,name=preferredStrategies"`
545573
}
546574
```
547575

@@ -556,7 +584,7 @@ a separate LeaseCandidate lease will be required for each lock.
556584
| Claimed by | Component instance | Election Coordinator. (Lease is claimed for to the elected component instance) |
557585
| Renewed by | Component instance | Component instance |
558586
| Leader Criteria | First component to claim lease | Best leader from available candidates at time of election |
559-
| Preemptable | No | Yes, Collaboratively. (Coordinator marks lease as "end of term". Component instance voluntarily stops renewing) |
587+
| Preemptable | No | Yes, Collaboratively. (Coordinator marks lease's next `preferredHolder`. Component instance voluntarily stops renewing) |
560588

561589
### User Stories (Optional)
562590

@@ -614,7 +642,7 @@ component.
614642
Example:
615643

616644
- HA cluster with 3 control plane nodes
617-
- 3 elected components (kube-controller-manager, schedule,
645+
- 3 elected components (kube-controller-manager, scheduler,
618646
cloud-controller-manager) per control plane node
619647
- 9 LeaseCandidate leases are created and renewed by the components
620648

0 commit comments

Comments
 (0)