Skip to content

Commit 6a5e337

Browse files
committed
Leader Migration to GA.
1 parent 2a556db commit 6a5e337

File tree

2 files changed

+23
-17
lines changed

2 files changed

+23
-17
lines changed

keps/sig-cloud-provider/2436-controller-manager-leader-migration/README.md

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
4848
- [X] (R) Graduation criteria is in place
4949
- [X] (R) Production readiness review completed
5050
- [X] (R) Production readiness review approved
51-
- [ ] "Implementation History" section is up-to-date for milestone
51+
- [X] "Implementation History" section is up-to-date for milestone
5252
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
5353
- [X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
5454

@@ -235,7 +235,7 @@ The default LeaderMigrationConfiguration can be represented as follows:
235235

236236
```yaml
237237
kind: LeaderMigrationConfiguration
238-
apiVersion: controllermanager.config.k8s.io/v1alpha1
238+
apiVersion: controllermanager.config.k8s.io/v1
239239
leaderName: cloud-provider-extraction-migration
240240
resourceLock: leases
241241
controllerLeaders:
@@ -292,9 +292,10 @@ unsetting the `--enable-leader-migration` flag.
292292
- test resource registration, parsing, and validation against the Schema APIs
293293
- test interactions with the leader election APIs
294294
- E2E Testing
295-
- In a single-node control plane with leader election setting, test control plane upgrade, assert controller managers
295+
- In a replicated control plane, test control plane upgrade, assert controller managers
296296
become health and ready after upgrade
297-
- In a multi-node control plane setting, test control plane upgrade, assert availability throughout the upgrade
297+
- In a replicated control plane, test control plane upgrade, assert no controllers
298+
become active in both controller managers.
298299

299300
### Graduation Criteria
300301

@@ -305,7 +306,10 @@ The default migration configuration is implemented and tested.
305306

306307
##### Beta -> GA Graduation
307308

308-
Leader migration configuration works on all in-tree cloud providers.
309+
- Leader Migration works on all in-tree cloud providers that require migration.
310+
- Leader Migration has an automated upgrade test on a replicated control plane, with Leader Migration enabled, of the following cases
311+
- Upgrade from KCM only to KCM + CCM
312+
- Rollback from KCM + CCM to KCM only
309313

310314
### Upgrade / Downgrade Strategy
311315

@@ -350,7 +354,8 @@ disabled.
350354

351355
###### How can a rollout or rollback fail? Can it impact already running workloads?
352356

353-
The rollout may fail if the configuration file does not represent correct controller-to-manager.
357+
The rollout may fail if the configuration file does not represent correct controller-to-manager assignment
358+
or configurations mismatch between controller managers.
354359
This can cause controllers referred in the configuration file to either be unavailable or run in multiple instances.
355360

356361
The rollback may fail if the leader election of the controller manager is not properly configured.
@@ -383,6 +388,8 @@ N/A. This feature is never used by any user workloads.
383388
- The `Lease` resource used in the migration can be watched for transition of leadership and timing information.
384389
- logs and metrics can directly indicate the status of migration.
385390

391+
Note that this feature is intended for cluster administrators, who should have access to metrics during the upgrade.
392+
386393
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
387394

388395
Leader Migration is designed to ensure availability of controller managers during upgrade,
@@ -391,14 +398,11 @@ and this feature will not affect SLOs of controller managers.
391398
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
392399

393400
- [X] Metrics
394-
- leader_active
395-
- other per-controller availability metrics.
401+
- per-controller health checks in both controller managers.
402+
- Components exposing the metric: kube-controller-manager, cloud-controller-manager
396403

397404
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
398405

399-
It would help if every controller that the controller manager hosts expose metrics about their availability.
400-
However, per-controller metrics are out of scope of this KEP.
401-
402406
Status of the migration lease, provided by the API server, can help observe the transition of holders
403407
if exposed as resource metrics.
404408

@@ -422,7 +426,7 @@ If the service accounts are not granted access to the lease resources, the RBAC
422426

423427
###### Will enabling / using this feature result in introducing new API types?
424428

425-
Type: `controllermanager.config.k8s.io/v1alpha1.LeaderMigrationConfiguration`
429+
Type: `controllermanager.config.k8s.io/v1.LeaderMigrationConfiguration`
426430
This resource is only for configuration file parsing. The resource should never reach the API server.
427431

428432
###### Will enabling / using this feature result in any new calls to the cloud provider?
@@ -466,7 +470,9 @@ N/A.
466470
- 12-28-2020 Parsing and validation merged as #96226
467471
- 03-10-2021 Implementation for alpha state completed, released in 1.21.
468472
- 03-30-2021 User guide published as kubernetes/website#26970
469-
- 05-11-2021 KEP updated to target beta.
473+
- 05-11-2021 KEP updated to target beta.
474+
- 01-21-2022 KEP updated to target GA.
475+
- 01-25-2022 Testing and monitoring revised for GA.
470476

471477
## Drawbacks
472478

keps/sig-cloud-provider/2436-controller-manager-leader-migration/kep.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,18 +20,18 @@ see-also:
2020
- "/keps/sig-cloud-provider/20180530-cloud-controller-manager.md"
2121

2222
# The target maturity stage in the current dev cycle for this KEP.
23-
stage: beta
23+
stage: stable
2424

2525
# The most recent milestone for which work toward delivery of this KEP has been
2626
# done. This can be the current (upcoming) milestone, if it is being actively
2727
# worked on.
28-
latest-milestone: "v1.22"
28+
latest-milestone: "v1.24"
2929

3030
# The milestone at which this feature was, or is targeted to be, at each stage.
3131
milestone:
3232
alpha: "v1.21"
3333
beta: "v1.22"
34-
stable: "v1.23"
34+
stable: "v1.24"
3535

3636
# The following PRR answers are required at alpha release
3737
# List the feature gate name and the components for which it must be enabled
@@ -44,4 +44,4 @@ disable-supported: true
4444

4545
# The following PRR answers are required at beta release
4646
metrics:
47-
- leader_active
47+
- per controller health checks

0 commit comments

Comments
 (0)