Skip to content

Commit 8d9379b

Browse files
authored
Merge pull request kubernetes#2083 from jiahuif/kep-cloud-controller-manager-migration-lease-api
cloud controller manager migration: default values and lease types update.
2 parents e9b84b1 + 521f597 commit 8d9379b

File tree

2 files changed

+111
-43
lines changed

2 files changed

+111
-43
lines changed

keps/sig-cloud-provider/991-cloud-controller-migration/README.md

Lines changed: 110 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,14 @@
1111
- [Proposal](#proposal)
1212
- [Implementation Details/Notes/Constraints [optional]](#implementation-detailsnotesconstraints-optional)
1313
- [Migration Configuration](#migration-configuration)
14+
- [Default LeaderMigrationConfiguration](#default-leadermigrationconfiguration)
1415
- [Component Flags](#component-flags)
15-
- [Example Walkthrough of Controller Migration](#example-walkthrough-of-controller-migration)
16+
- [Example Walkthrough of Controller Migration with Default Configuration](#example-walkthrough-of-controller-migration-with-default-configuration)
1617
- [Enable Leader Migration on Components](#enable-leader-migration-on-components)
17-
- [Deploy the CCM](#deploy-the-ccm)
18-
- [Update Leader Migration Config on Upgrade](#update-leader-migration-config-on-upgrade)
18+
- [Upgrade the Control Plane](#upgrade-the-control-plane)
1919
- [Disable Leader Migration](#disable-leader-migration)
2020
- [Risks and Mitigations](#risks-and-mitigations)
21+
- [Test Plan](#test-plan)
2122
- [Graduation Criteria](#graduation-criteria)
2223
- [Alpha -> Beta Graduation](#alpha---beta-graduation)
2324
- [Beta -> GA Graduation](#beta---ga-graduation)
@@ -35,12 +36,12 @@ For enhancements that make changes to code or processes/procedures in core Kuber
3536

3637
Check these off as they are completed for the Release Team to track. These checklist items _must_ be updated for the enhancement to be released.
3738

38-
- [ ] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
39-
- [ ] KEP approvers have set the KEP status to `implementable`
40-
- [ ] Design details are appropriately documented
41-
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
42-
- [ ] Graduation criteria is in place
43-
- [ ] "Implementation History" section is up-to-date for milestone
39+
- [X] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
40+
- [X] KEP approvers have set the KEP status to `implementable`
41+
- [X] Design details are appropriately documented
42+
- [X] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
43+
- [X] Graduation criteria is in place
44+
- [X] "Implementation History" section is up-to-date for milestone
4445
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
4546
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
4647

@@ -75,7 +76,7 @@ the respective out-of-tree cloud-controller-manager.
7576

7677
### Goals
7778

78-
* Define migration process for large scale, highly available clusters to migrate from the in-tree cloud provider mechnaism, to their out-of-tree equivalents.
79+
* Define migration process for large scale, highly available clusters to migrate from the in-tree cloud provider mechanism, to their out-of-tree equivalents.
7980

8081
### Non-Goals
8182

@@ -102,14 +103,14 @@ _primary_ and N configurable _secondary_ (a.k.a migration) leader election locks
102103
The primary lock represents the current leader election resource lock in the KCM and the CCM. The set of
103104
secondary locks are defined by the cloud provider and run in parallel to the primary locks. For a migration
104105
lock defined by the cloud provider, the cloud provider also determines the set of controllers run within the
105-
migration lock and the controller manager it should run in - either the CCM or the KCM.
106+
migration lock and the controller manager it will run in - either the CCM or the KCM.
106107

107108
The properties of the migration lock are:
108109
* must have a unique name
109110
* the set of controllers in the lock is immutable.
110111
* no two migration locks should have overlapping controllers
111112
* the controller manager where the lock runs can change across releases.
112-
* for a minor release it should run exclusively in one type of controller manager - KCM or CCM.
113+
* for a minor release it must run exclusively in one type of controller manager - KCM or CCM.
113114

114115
During migration, either the KCM or CCM may have multiple migration locks, though for performance reasons no more than 2 locks is recommended.
115116

@@ -150,42 +151,84 @@ The migration lock will be configured by defining new API types that will then b
150151
type LeaderMigrationConfiguration struct {
151152
metav1.TypeMeta `json:",inline"`
152153

153-
// LeaderName is the name of the resource under which the controllers should be run.
154+
// LeaderName is the name of the resource under which the controllers will be run.
154155
LeaderName string `json:"leaderName"`
155156

156-
// ControllerLeaders contains a list of migrating leader lock configurations
157-
ControllerLeaders []ControllerLeaderConfiguration `json:"controllerLeaders"`
157+
// ResourceLock indicates the resource object type that will be used to lock
158+
// Must be either "leases" or "endpoints", defaults to 'leases'
159+
// No other types (e.g. "endpointsleases" or "configmapsleases") are allowed
160+
ResourceLock string
161+
162+
// ControllerLeaders contains a list of migrating leader lock configurations
163+
ControllerLeaders []ControllerLeaderConfiguration `json:"controllerLeaders"`
158164
}
159165

160166
// ControllerLeaderConfiguration provides the configuration for a migrating leader lock.
161167
type ControllerLeaderConfiguration struct {
162-
// Name is the name of the controller being migrated
163-
// E.g. service-controller, route-controller, cloud-node-controller, etc
164-
Name string `json:"name"`
168+
// Name is the name of the controller being migrated
169+
// E.g. service-controller, route-controller, cloud-node-controller, etc
170+
Name string `json:"name"`
165171

166-
// Component is the name of the component in which the controller should be running.
167-
// E.g. kube-controller-manager, cloud-controller-manager, etc
168-
Component string `json:"component"`
172+
// Component is the name of the component in which the controller will be running.
173+
// E.g. kube-controller-manager, cloud-controller-manager, etc
174+
Component string `json:"component"`
169175
}
170176
```
171177

178+
#### Default LeaderMigrationConfiguration
179+
180+
The `staging/controller-manager` package will provide `kube-controller-manager` and `cloud-controller-manager`
181+
each a default `LeaderMigrationConfiguration` that represents the situation where the controller manager is running with
182+
default assignments of controllers and lock type selection.
183+
184+
Please refer to [an workthough](#example-walkthrough-of-controller-migration-with-default-configuration)
185+
of an example cloud controllers migration from KCM to CCM that use the default configuration.
186+
187+
The default values must be only used when no configuration file is specified. If a custom configuration file is
188+
specified to either controller manager, the specified configuration will completely replace default value for the
189+
corresponding controller manager.
190+
172191
#### Component Flags
173192

174-
The LeaderMigrationConfiguration type will be read by the `kube-controller-manager` and the `cloud-controller-manager` via a new flag `--cloud-migration-config` which
175-
accepts a path to a file containing the LeaderMigrationConfiguration type in yaml.
193+
Both `kube-controller-manager` and `cloud-controller-manager` will get support for the following two flags for Leader
194+
Migration. First, `--enable-leader-migration` is a boolean flag which defaults to `false` that indicates whether Leader
195+
Migration is enabled. Second, `--leader-migration-config` is an optional flag that accepts a path to a file containing
196+
the `LeaderMigrationConfiguration` type serialized in yaml.
197+
198+
If `--enable-leader-migration` is `true` but `--leader-migration-config` flag is empty or not set, the
199+
default `LeaderMigrationConfiguration` for corresponding controller manager will be used.
200+
201+
If `--enable-leader-migration` is not set or set to `false`, but `--leader-migration-config` is set and not empty, the
202+
controller manager will print an error at `FATAL` level and exit immediately. Additionally,
203+
if `--leader-migration-config` is set but the configuration file cannot be read or parsed, the controller manager will
204+
log the failure at `FATAL` level and exit immediately.
176205

177-
#### Example Walkthrough of Controller Migration
206+
#### Example Walkthrough of Controller Migration with Default Configuration
178207

179-
This is an example of how you would migrate all cloud controllers from the CCM to the KCM during a typical cluster version upgrade.
208+
This is an example of migrating a KCM-only Kubernetes 1.21 control plane to KCM + CCM 1.22.
209+
210+
After the upgrade, all cloud controllers will be moved from the KCM to the KCM. We assume KCM and CCM are running with
211+
default controller assignments, namely, in 1.21, KCM runs `route-controller`, `service-controller`
212+
, `cloud-node-controller`, and `cloud-nodelifecycle-controller`, and in 1.22, CCM instead will run all the 4
213+
controllers.
214+
215+
If KCM and CCM are not running with the default controller assignments, a custom configuration file can be specified
216+
with `--leader-migration-config`. However, this example only covers the simple case of using default configuration.
217+
218+
At the beginning, KCM should not have `--enable-leader-migration` or `--leader-migration-config` set, but it should
219+
have `--cloud-provider` already set to an existing cloud provider (e.g. `--cloud-provider=gce`). At this point, KCM
220+
runs `route-controller`, `service-controller`, `cloud-node-controller`, and `cloud-nodelifecycle-controller`. CCM is not
221+
yet deployed.
180222

181223
##### Enable Leader Migration on Components
182224

183-
First, define a LeaderMigrationConfiguration resource in a yaml file containing all known cloud controllers. The component name for each controller should be set to
184-
the component where the controllers are currently running. Almost always this is the `kube-controller-manager`. The configuration file should look something like this:
225+
The provided default configuration will be equivalent to the following:
226+
185227
```yaml
186228
kind: LeaderMigrationConfiguration
187229
apiVersion: v1alpha1
188-
leaderName: cloud-controllers-migration
230+
leaderName: cloud-provider-extraction-migration
231+
resourceLock: leases
189232
controllerLeaders:
190233
- name: route-controller
191234
component: kube-controller-manager
@@ -197,23 +240,32 @@ controllerLeaders:
197240
component: kube-controller-manager
198241
```
199242
200-
Save the leader migration configuration file somewhere, for this example we'll use `/etc/kubernetes/cloud-controller-migration.yaml`.
201-
Now update the kube-controller-manager to set `--cloud-migration-config /etc/kubernetes/cloud-controller-migration.yaml`.
243+
First, within 1.21 control plane, update the `kube-controller-manager` to set `--enable-leader-migration` but
244+
not `--leader-migration-config`, this flag enables Leader Migration with default configuration, which prepares KCM to
245+
participate in the migration.
202246

203-
##### Deploy the CCM
247+
##### Upgrade the Control Plane
204248

205-
Now deploy the CCM on your cluster but ensure it also has the `--cloud-migration-config` flag set, using the same config file you used for the KCM above.
249+
Upgrade each node of the control plane to 1.22 with the following updates:
206250

207-
How the CCM is deployed is out of scope for this KEP, refer to the cloud provider's documentation on how to do this.
251+
- KCM has neither `--enable-leader-migration` or `--leader-migration-config`
252+
- KCM has no cloud provider enabled with`--cloud-provider=`
253+
- CCM deployed with `--enable-leader-migration`
254+
- CCM has its `--cloud-provider` set to the correct cloud provider
208255

209-
##### Update Leader Migration Config on Upgrade
256+
After upgrade, CCM will run `route-controller`, `service-controller`, `cloud-node-controller`,
257+
and `cloud-nodelifecycle-controller`. The Leader Migration support will ensure CCM cleanly take out these controllers
258+
during the control plane upgrade.
259+
260+
As a reference, the provided default configuration in version 1.22 should have the `component` field of all affected
261+
controllers changed to `cloud-controller-manager`. The resulting default configuration should be equivalent to the
262+
following:
210263

211-
To migrate controllers from the KCM to the CCM, update the component field from `kube-controller-manager` to `cloud-controller-manager` on every control plane node prior to
212-
upgrading the node. If you are replacing nodes on upgrade, ensure new nodes set the `component` field to `cloud-controller-manager`. The new config file should look like this:
213264
```yaml
214265
kind: LeaderMigrationConfiguration
215266
apiVersion: v1alpha1
216-
leaderName: cloud-controllers-migration
267+
leaderName: cloud-provider-extraction-migration
268+
resourceLock: leases
217269
controllerLeaders:
218270
- name: route-controller
219271
component: cloud-controller-manager
@@ -225,19 +277,32 @@ controllerLeaders:
225277
component: cloud-controller-manager
226278
```
227279

228-
NOTE: During upgrade, it is acceptable for control plane nodes to specify different component names for each controller as long as the `leaderName` field is the same across nodes.
280+
Please take note on how component names across both versions differs for each controller.
229281

230282
##### Disable Leader Migration
231283

232-
Once all controllers are migrated to the desired component:
233-
* disable the cloud provider in the `kube-controller-manager` (set `--cloud-provider=external`)
234-
* disable leader migration on the `kube-controller-manager` and `cloud-controller-manager` by unsetting the `--cloud-migration-config` field.
284+
Once all nodes in the control plane are upgraded to 1.22, disable leader migration on the `cloud-controller-manager` by
285+
unsetting the `--enable-migration-config` flag.
235286

236287
### Risks and Mitigations
237288

238289
* Increased apiserver load due to new leader election resource per migration configuration.
239290
* User error could result in cloud controllers not running in any component at all.
240291

292+
### Test Plan
293+
294+
- Unit Testing:
295+
- test resource reading, parsing, validation
296+
- test calculation of leader differences.
297+
- test all helpers
298+
- Integration Testing
299+
- test resource registration, parsing, and validation against the Schema APIs
300+
- test interactions with the leader election APIs
301+
- E2E Testing
302+
- In a single-node control plane with leader election setting, test control plane upgrade, assert controller managers
303+
become health and ready after upgrade
304+
- In a multi-node control plane setting, test control plane upgrade, assert availability throughout the upgrade
305+
241306
### Graduation Criteria
242307

243308
##### Alpha -> Beta Graduation
@@ -261,4 +326,7 @@ Version skew is handled as long as the leader name is consistent across all cont
261326
## Implementation History
262327

263328
- 07-25-2019 `Summary` and `Motivation` sections were merged signaling SIG acceptance
264-
- 01-21-2019 Implementation details are proposed to move KEP to `implementable` state.
329+
- 01-21-2019 Implementation details are proposed to move KEP to `implementable` state.
330+
- 09-30-2020 `LeaderMigrationConfiguration` and `ControllerLeaderConfiguration` schemas merged as #94205.
331+
- 11-04-2020 Registration of both types merged as #96133
332+
- 12-28-2020 Parsing and validation merged as #96226

keps/sig-cloud-provider/991-cloud-controller-migration/kep.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ approvers:
1212
- "@lavalamp"
1313
editor: TBD
1414
creation-date: 2019-04-22
15-
last-updated: 2019-04-22
15+
last-updated: 2021-02-08
1616
status: implementable
1717
see-also:
1818
- "/keps/sig-cloud-provider/20180530-cloud-controller-manager.md"

0 commit comments

Comments
 (0)