You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Beta -> GA Graduation](#beta---ga-graduation)
@@ -35,12 +36,12 @@ For enhancements that make changes to code or processes/procedures in core Kuber
35
36
36
37
Check these off as they are completed for the Release Team to track. These checklist items _must_ be updated for the enhancement to be released.
37
38
38
-
-[] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
39
-
-[] KEP approvers have set the KEP status to `implementable`
40
-
-[] Design details are appropriately documented
41
-
-[] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
42
-
-[] Graduation criteria is in place
43
-
-[] "Implementation History" section is up-to-date for milestone
39
+
-[X] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
40
+
-[X] KEP approvers have set the KEP status to `implementable`
41
+
-[X] Design details are appropriately documented
42
+
-[X] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
43
+
-[X] Graduation criteria is in place
44
+
-[X] "Implementation History" section is up-to-date for milestone
44
45
-[ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
45
46
-[ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
46
47
@@ -75,7 +76,7 @@ the respective out-of-tree cloud-controller-manager.
75
76
76
77
### Goals
77
78
78
-
* Define migration process for large scale, highly available clusters to migrate from the in-tree cloud provider mechnaism, to their out-of-tree equivalents.
79
+
* Define migration process for large scale, highly available clusters to migrate from the in-tree cloud provider mechanism, to their out-of-tree equivalents.
79
80
80
81
### Non-Goals
81
82
@@ -102,14 +103,14 @@ _primary_ and N configurable _secondary_ (a.k.a migration) leader election locks
102
103
The primary lock represents the current leader election resource lock in the KCM and the CCM. The set of
103
104
secondary locks are defined by the cloud provider and run in parallel to the primary locks. For a migration
104
105
lock defined by the cloud provider, the cloud provider also determines the set of controllers run within the
105
-
migration lock and the controller manager it should run in - either the CCM or the KCM.
106
+
migration lock and the controller manager it will run in - either the CCM or the KCM.
106
107
107
108
The properties of the migration lock are:
108
109
* must have a unique name
109
110
* the set of controllers in the lock is immutable.
110
111
* no two migration locks should have overlapping controllers
111
112
* the controller manager where the lock runs can change across releases.
112
-
* for a minor release it should run exclusively in one type of controller manager - KCM or CCM.
113
+
* for a minor release it must run exclusively in one type of controller manager - KCM or CCM.
113
114
114
115
During migration, either the KCM or CCM may have multiple migration locks, though for performance reasons no more than 2 locks is recommended.
115
116
@@ -150,42 +151,84 @@ The migration lock will be configured by defining new API types that will then b
150
151
typeLeaderMigrationConfigurationstruct {
151
152
metav1.TypeMeta`json:",inline"`
152
153
153
-
// LeaderName is the name of the resource under which the controllers should be run.
154
+
// LeaderName is the name of the resource under which the controllers will be run.
154
155
LeaderNamestring`json:"leaderName"`
155
156
156
-
// ControllerLeaders contains a list of migrating leader lock configurations
// ControllerLeaderConfiguration provides the configuration for a migrating leader lock.
161
167
typeControllerLeaderConfigurationstruct {
162
-
// Name is the name of the controller being migrated
163
-
// E.g. service-controller, route-controller, cloud-node-controller, etc
164
-
Namestring`json:"name"`
168
+
// Name is the name of the controller being migrated
169
+
// E.g. service-controller, route-controller, cloud-node-controller, etc
170
+
Name string`json:"name"`
165
171
166
-
// Component is the name of the component in which the controller should be running.
167
-
// E.g. kube-controller-manager, cloud-controller-manager, etc
168
-
Componentstring`json:"component"`
172
+
// Component is the name of the component in which the controller will be running.
173
+
// E.g. kube-controller-manager, cloud-controller-manager, etc
174
+
Component string`json:"component"`
169
175
}
170
176
```
171
177
178
+
#### Default LeaderMigrationConfiguration
179
+
180
+
The `staging/controller-manager` package will provide `kube-controller-manager` and `cloud-controller-manager`
181
+
each a default `LeaderMigrationConfiguration` that represents the situation where the controller manager is running with
182
+
default assignments of controllers and lock type selection.
183
+
184
+
Please refer to [an workthough](#example-walkthrough-of-controller-migration-with-default-configuration)
185
+
of an example cloud controllers migration from KCM to CCM that use the default configuration.
186
+
187
+
The default values must be only used when no configuration file is specified. If a custom configuration file is
188
+
specified to either controller manager, the specified configuration will completely replace default value for the
189
+
corresponding controller manager.
190
+
172
191
#### Component Flags
173
192
174
-
The LeaderMigrationConfiguration type will be read by the `kube-controller-manager` and the `cloud-controller-manager` via a new flag `--cloud-migration-config` which
175
-
accepts a path to a file containing the LeaderMigrationConfiguration type in yaml.
193
+
Both `kube-controller-manager` and `cloud-controller-manager` will get support for the following two flags for Leader
194
+
Migration. First, `--enable-leader-migration` is a boolean flag which defaults to `false` that indicates whether Leader
195
+
Migration is enabled. Second, `--leader-migration-config` is an optional flag that accepts a path to a file containing
196
+
the `LeaderMigrationConfiguration` type serialized in yaml.
197
+
198
+
If `--enable-leader-migration` is `true` but `--leader-migration-config` flag is empty or not set, the
199
+
default `LeaderMigrationConfiguration` for corresponding controller manager will be used.
200
+
201
+
If `--enable-leader-migration` is not set or set to `false`, but `--leader-migration-config` is set and not empty, the
202
+
controller manager will print an error at `FATAL` level and exit immediately. Additionally,
203
+
if `--leader-migration-config` is set but the configuration file cannot be read or parsed, the controller manager will
204
+
log the failure at `FATAL` level and exit immediately.
176
205
177
-
#### Example Walkthrough of Controller Migration
206
+
#### Example Walkthrough of Controller Migration with Default Configuration
178
207
179
-
This is an example of how you would migrate all cloud controllers from the CCM to the KCM during a typical cluster version upgrade.
208
+
This is an example of migrating a KCM-only Kubernetes 1.21 control plane to KCM + CCM 1.22.
209
+
210
+
After the upgrade, all cloud controllers will be moved from the KCM to the KCM. We assume KCM and CCM are running with
211
+
default controller assignments, namely, in 1.21, KCM runs `route-controller`, `service-controller`
212
+
, `cloud-node-controller`, and `cloud-nodelifecycle-controller`, and in 1.22, CCM instead will run all the 4
213
+
controllers.
214
+
215
+
If KCM and CCM are not running with the default controller assignments, a custom configuration file can be specified
216
+
with `--leader-migration-config`. However, this example only covers the simple case of using default configuration.
217
+
218
+
At the beginning, KCM should not have `--enable-leader-migration` or `--leader-migration-config` set, but it should
219
+
have `--cloud-provider` already set to an existing cloud provider (e.g. `--cloud-provider=gce`). At this point, KCM
220
+
runs `route-controller`, `service-controller`, `cloud-node-controller`, and `cloud-nodelifecycle-controller`. CCM is not
221
+
yet deployed.
180
222
181
223
##### Enable Leader Migration on Components
182
224
183
-
First, define a LeaderMigrationConfiguration resource in a yaml file containing all known cloud controllers. The component name for each controller should be set to
184
-
the component where the controllers are currently running. Almost always this is the `kube-controller-manager`. The configuration file should look something like this:
225
+
The provided default configuration will be equivalent to the following:
226
+
185
227
```yaml
186
228
kind: LeaderMigrationConfiguration
187
229
apiVersion: v1alpha1
188
-
leaderName: cloud-controllers-migration
230
+
leaderName: cloud-provider-extraction-migration
231
+
resourceLock: leases
189
232
controllerLeaders:
190
233
- name: route-controller
191
234
component: kube-controller-manager
@@ -197,23 +240,32 @@ controllerLeaders:
197
240
component: kube-controller-manager
198
241
```
199
242
200
-
Save the leader migration configuration file somewhere, for this example we'll use `/etc/kubernetes/cloud-controller-migration.yaml`.
201
-
Now update the kube-controller-manager to set `--cloud-migration-config /etc/kubernetes/cloud-controller-migration.yaml`.
243
+
First, within 1.21 control plane, update the `kube-controller-manager` to set `--enable-leader-migration` but
244
+
not `--leader-migration-config`, this flag enables Leader Migration with default configuration, which prepares KCM to
245
+
participate in the migration.
202
246
203
-
##### Deploy the CCM
247
+
##### Upgrade the Control Plane
204
248
205
-
Now deploy the CCM on your cluster but ensure it also has the `--cloud-migration-config` flag set, using the same config file you used for the KCM above.
249
+
Upgrade each node of the control plane to 1.22 with the following updates:
206
250
207
-
How the CCM is deployed is out of scope for this KEP, refer to the cloud provider's documentation on how to do this.
251
+
- KCM has neither `--enable-leader-migration` or `--leader-migration-config`
252
+
- KCM has no cloud provider enabled with`--cloud-provider=`
253
+
- CCM deployed with `--enable-leader-migration`
254
+
- CCM has its `--cloud-provider` set to the correct cloud provider
208
255
209
-
##### Update Leader Migration Config on Upgrade
256
+
After upgrade, CCM will run `route-controller`, `service-controller`, `cloud-node-controller`,
257
+
and `cloud-nodelifecycle-controller`. The Leader Migration support will ensure CCM cleanly take out these controllers
258
+
during the control plane upgrade.
259
+
260
+
As a reference, the provided default configuration in version 1.22 should have the `component` field of all affected
261
+
controllers changed to `cloud-controller-manager`. The resulting default configuration should be equivalent to the
262
+
following:
210
263
211
-
To migrate controllers from the KCM to the CCM, update the component field from `kube-controller-manager` to `cloud-controller-manager` on every control plane node prior to
212
-
upgrading the node. If you are replacing nodes on upgrade, ensure new nodes set the `component` field to `cloud-controller-manager`. The new config file should look like this:
213
264
```yaml
214
265
kind: LeaderMigrationConfiguration
215
266
apiVersion: v1alpha1
216
-
leaderName: cloud-controllers-migration
267
+
leaderName: cloud-provider-extraction-migration
268
+
resourceLock: leases
217
269
controllerLeaders:
218
270
- name: route-controller
219
271
component: cloud-controller-manager
@@ -225,19 +277,32 @@ controllerLeaders:
225
277
component: cloud-controller-manager
226
278
```
227
279
228
-
NOTE: During upgrade, it is acceptable for control plane nodes to specify different component names for each controller as long as the `leaderName` field is the same across nodes.
280
+
Please take note on how component names across both versions differs for each controller.
229
281
230
282
##### Disable Leader Migration
231
283
232
-
Once all controllers are migrated to the desired component:
233
-
* disable the cloud provider in the `kube-controller-manager` (set `--cloud-provider=external`)
234
-
* disable leader migration on the `kube-controller-manager` and `cloud-controller-manager` by unsetting the `--cloud-migration-config` field.
284
+
Once all nodes in the control plane are upgraded to 1.22, disable leader migration on the `cloud-controller-manager` by
285
+
unsetting the `--enable-migration-config` flag.
235
286
236
287
### Risks and Mitigations
237
288
238
289
* Increased apiserver load due to new leader election resource per migration configuration.
239
290
* User error could result in cloud controllers not running in any component at all.
240
291
292
+
### Test Plan
293
+
294
+
- Unit Testing:
295
+
- test resource reading, parsing, validation
296
+
- test calculation of leader differences.
297
+
- test all helpers
298
+
- Integration Testing
299
+
- test resource registration, parsing, and validation against the Schema APIs
300
+
- test interactions with the leader election APIs
301
+
- E2E Testing
302
+
- In a single-node control plane with leader election setting, test control plane upgrade, assert controller managers
303
+
become health and ready after upgrade
304
+
- In a multi-node control plane setting, test control plane upgrade, assert availability throughout the upgrade
305
+
241
306
### Graduation Criteria
242
307
243
308
##### Alpha -> Beta Graduation
@@ -261,4 +326,7 @@ Version skew is handled as long as the leader name is consistent across all cont
261
326
## Implementation History
262
327
263
328
- 07-25-2019 `Summary` and `Motivation` sections were merged signaling SIG acceptance
264
-
- 01-21-2019 Implementation details are proposed to move KEP to `implementable` state.
329
+
- 01-21-2019 Implementation details are proposed to move KEP to `implementable` state.
330
+
- 09-30-2020 `LeaderMigrationConfiguration` and `ControllerLeaderConfiguration` schemas merged as #94205.
331
+
- 11-04-2020 Registration of both types merged as #96133
332
+
- 12-28-2020 Parsing and validation merged as #96226
0 commit comments