Skip to content

Commit 5cc5bf8

Browse files
committed
update text to reflect ongoing discussions.
1 parent 6c2e020 commit 5cc5bf8

File tree

1 file changed

+96
-42
lines changed
  • keps/sig-cloud-provider/991-cloud-controller-migration

1 file changed

+96
-42
lines changed

keps/sig-cloud-provider/991-cloud-controller-migration/README.md

Lines changed: 96 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,14 @@
1111
- [Proposal](#proposal)
1212
- [Implementation Details/Notes/Constraints [optional]](#implementation-detailsnotesconstraints-optional)
1313
- [Migration Configuration](#migration-configuration)
14+
- [Default LeaderMigrationConfiguration](#default-leadermigrationconfiguration)
1415
- [Component Flags](#component-flags)
15-
- [Example Walkthrough of Controller Migration](#example-walkthrough-of-controller-migration)
16+
- [Example Walkthrough of Controller Migration with Default Configuration](#example-walkthrough-of-controller-migration-with-default-configuration)
1617
- [Enable Leader Migration on Components](#enable-leader-migration-on-components)
17-
- [Deploy the CCM](#deploy-the-ccm)
18-
- [Update Leader Migration Config on Upgrade](#update-leader-migration-config-on-upgrade)
18+
- [Upgrade the Control Plane](#upgrade-the-control-plane)
1919
- [Disable Leader Migration](#disable-leader-migration)
2020
- [Risks and Mitigations](#risks-and-mitigations)
21+
- [Test Plan](#test-plan)
2122
- [Graduation Criteria](#graduation-criteria)
2223
- [Alpha -> Beta Graduation](#alpha---beta-graduation)
2324
- [Beta -> GA Graduation](#beta---ga-graduation)
@@ -35,12 +36,12 @@ For enhancements that make changes to code or processes/procedures in core Kuber
3536

3637
Check these off as they are completed for the Release Team to track. These checklist items _must_ be updated for the enhancement to be released.
3738

38-
- [ ] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
39-
- [ ] KEP approvers have set the KEP status to `implementable`
40-
- [ ] Design details are appropriately documented
41-
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
42-
- [ ] Graduation criteria is in place
43-
- [ ] "Implementation History" section is up-to-date for milestone
39+
- [X] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
40+
- [X] KEP approvers have set the KEP status to `implementable`
41+
- [X] Design details are appropriately documented
42+
- [X] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
43+
- [X] Graduation criteria is in place
44+
- [X] "Implementation History" section is up-to-date for milestone
4445
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
4546
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
4647

@@ -75,7 +76,7 @@ the respective out-of-tree cloud-controller-manager.
7576

7677
### Goals
7778

78-
* Define migration process for large scale, highly available clusters to migrate from the in-tree cloud provider mechnaism, to their out-of-tree equivalents.
79+
* Define migration process for large scale, highly available clusters to migrate from the in-tree cloud provider mechanism, to their out-of-tree equivalents.
7980

8081
### Non-Goals
8182

@@ -102,14 +103,14 @@ _primary_ and N configurable _secondary_ (a.k.a migration) leader election locks
102103
The primary lock represents the current leader election resource lock in the KCM and the CCM. The set of
103104
secondary locks are defined by the cloud provider and run in parallel to the primary locks. For a migration
104105
lock defined by the cloud provider, the cloud provider also determines the set of controllers run within the
105-
migration lock and the controller manager it should run in - either the CCM or the KCM.
106+
migration lock and the controller manager it will run in - either the CCM or the KCM.
106107

107108
The properties of the migration lock are:
108109
* must have a unique name
109110
* the set of controllers in the lock is immutable.
110111
* no two migration locks should have overlapping controllers
111112
* the controller manager where the lock runs can change across releases.
112-
* for a minor release it should run exclusively in one type of controller manager - KCM or CCM.
113+
* for a minor release it must run exclusively in one type of controller manager - KCM or CCM.
113114

114115
During migration, either the KCM or CCM may have multiple migration locks, though for performance reasons no more than 2 locks is recommended.
115116

@@ -150,42 +151,84 @@ The migration lock will be configured by defining new API types that will then b
150151
type LeaderMigrationConfiguration struct {
151152
metav1.TypeMeta `json:",inline"`
152153

153-
// LeaderName is the name of the resource under which the controllers should be run.
154+
// LeaderName is the name of the resource under which the controllers will be run.
154155
LeaderName string `json:"leaderName"`
155156

156-
// ControllerLeaders contains a list of migrating leader lock configurations
157-
ControllerLeaders []ControllerLeaderConfiguration `json:"controllerLeaders"`
157+
// ResourceLock indicates the resource object type that will be used to lock
158+
// Must be either "leases" or "endpoints", defaults to 'leases'
159+
// No other types (e.g. "endpointsleases" or "configmapsleases") are allowed
160+
ResourceLock string
161+
162+
// ControllerLeaders contains a list of migrating leader lock configurations
163+
ControllerLeaders []ControllerLeaderConfiguration `json:"controllerLeaders"`
158164
}
159165

160166
// ControllerLeaderConfiguration provides the configuration for a migrating leader lock.
161167
type ControllerLeaderConfiguration struct {
162-
// Name is the name of the controller being migrated
163-
// E.g. service-controller, route-controller, cloud-node-controller, etc
164-
Name string `json:"name"`
168+
// Name is the name of the controller being migrated
169+
// E.g. service-controller, route-controller, cloud-node-controller, etc
170+
Name string `json:"name"`
165171

166-
// Component is the name of the component in which the controller should be running.
167-
// E.g. kube-controller-manager, cloud-controller-manager, etc
168-
Component string `json:"component"`
172+
// Component is the name of the component in which the controller will be running.
173+
// E.g. kube-controller-manager, cloud-controller-manager, etc
174+
Component string `json:"component"`
169175
}
170176
```
171177

178+
#### Default LeaderMigrationConfiguration
179+
180+
The `staging/controller-manager` package will provide `kube-controller-manager` and `cloud-controller-manager`
181+
each a default `LeaderMigrationConfiguration` that represents the situation where the controller manager is running with
182+
default assignments of controllers and lock type selection.
183+
184+
Please refer to [an workthough](#example-walkthrough-of-controller-migration-with-default-configuration)
185+
of an example cloud controllers migration from KCM to CCM that use the default configuration.
186+
187+
The default values must be only used when no configuration file is specified. If a custom configuration file is
188+
specified to either controller manager, the specified configuration will completely replace default value for the
189+
corresponding controller manager.
190+
172191
#### Component Flags
173192

174-
The LeaderMigrationConfiguration type will be read by the `kube-controller-manager` and the `cloud-controller-manager` via a new flag `--cloud-migration-config` which
175-
accepts a path to a file containing the LeaderMigrationConfiguration type in yaml.
193+
Both `kube-controller-manager` and `cloud-controller-manager` will get support for the following two flags for Leader
194+
Migration. First, `--enable-leader-migration` is a boolean flag which defaults to `false` that indicates whether Leader
195+
Migration is enabled. Second, `--leader-migration-config` is an optional flag that accepts a path to a file containing
196+
the `LeaderMigrationConfiguration` type serialized in yaml.
197+
198+
If `--enable-leader-migration` is `true` but `--leader-migration-config` flag is empty or not set, the
199+
default `LeaderMigrationConfiguration` for corresponding controller manager will be used.
176200

177-
#### Example Walkthrough of Controller Migration
201+
If `--enable-leader-migration` is not set or set to `false`, but `--leader-migration-config` is set and not empty, the
202+
controller manager will print an error at `FATAL` level and exit immediately. Additionally,
203+
if `--leader-migration-config` is set but the configuration file cannot be read or parsed, the controller manager will
204+
log the failure at `FATAL` level and exit immediately.
178205

179-
This is an example of how you would migrate all cloud controllers from the CCM to the KCM during a typical cluster version upgrade.
206+
#### Example Walkthrough of Controller Migration with Default Configuration
207+
208+
This is an example of migrating a KCM-only Kubernetes 1.21 control plane to KCM + CCM 1.22.
209+
210+
After the upgrade, all cloud controllers will be moved from the KCM to the KCM. We assume KCM and CCM are running with
211+
default controller assignments, namely, in 1.21, KCM runs `route-controller`, `service-controller`
212+
, `cloud-node-controller`, and `cloud-nodelifecycle-controller`, and in 1.22, CCM instead will run all the 4
213+
controllers.
214+
215+
If KCM and CCM are not running with the default controller assignments, a custom configuration file can be specified
216+
with `--leader-migration-config`. However, this example only covers the simple case of using default configuration.
217+
218+
At the beginning, KCM should not have `--enable-leader-migration` or `--leader-migration-config` set, but it should
219+
have `--cloud-provider` already set to an existing cloud provider (e.g. `--cloud-provider=gce`). At this point, KCM
220+
runs `route-controller`, `service-controller`, `cloud-node-controller`, and `cloud-nodelifecycle-controller`. CCM is not
221+
yet deployed.
180222

181223
##### Enable Leader Migration on Components
182224

183-
First, define a LeaderMigrationConfiguration resource in a yaml file containing all known cloud controllers. The component name for each controller should be set to
184-
the component where the controllers are currently running. Almost always this is the `kube-controller-manager`. The configuration file should look something like this:
225+
The provided default configuration will be equivalent to the following:
226+
185227
```yaml
186228
kind: LeaderMigrationConfiguration
187229
apiVersion: v1alpha1
188-
leaderName: cloud-controllers-migration
230+
leaderName: cloud-provider-extraction-migration
231+
resourceLock: leases
189232
controllerLeaders:
190233
- name: route-controller
191234
component: kube-controller-manager
@@ -197,23 +240,32 @@ controllerLeaders:
197240
component: kube-controller-manager
198241
```
199242
200-
Save the leader migration configuration file somewhere, for this example we'll use `/etc/kubernetes/cloud-controller-migration.yaml`.
201-
Now update the kube-controller-manager to set `--cloud-migration-config /etc/kubernetes/cloud-controller-migration.yaml`.
243+
First, within 1.21 control plane, update the `kube-controller-manager` to set `--enable-leader-migration` but
244+
not `--leader-migration-config`, this flag enables Leader Migration with default configuration, which prepares KCM to
245+
participate in the migration.
246+
247+
##### Upgrade the Control Plane
202248

203-
##### Deploy the CCM
249+
Upgrade each node of the control plane to 1.22 with the following updates:
204250

205-
Now deploy the CCM on your cluster but ensure it also has the `--cloud-migration-config` flag set, using the same config file you used for the KCM above.
251+
- KCM has neither `--enable-leader-migration` or `--leader-migration-config`
252+
- KCM has no cloud provider enabled with`--cloud-provider=`
253+
- CCM deployed with `--enable-leader-migration`
254+
- CCM has its `--cloud-provider` set to the correct cloud provider
206255

207-
How the CCM is deployed is out of scope for this KEP, refer to the cloud provider's documentation on how to do this.
256+
After upgrade, CCM will run `route-controller`, `service-controller`, `cloud-node-controller`,
257+
and `cloud-nodelifecycle-controller`. The Leader Migration support will ensure CCM cleanly take out these controllers
258+
during the control plane upgrade.
208259

209-
##### Update Leader Migration Config on Upgrade
260+
As a reference, the provided default configuration in version 1.22 should have the `component` field of all affected
261+
controllers changed to `cloud-controller-manager`. The resulting default configuration should be equivalent to the
262+
following:
210263

211-
To migrate controllers from the KCM to the CCM, update the component field from `kube-controller-manager` to `cloud-controller-manager` on every control plane node prior to
212-
upgrading the node. If you are replacing nodes on upgrade, ensure new nodes set the `component` field to `cloud-controller-manager`. The new config file should look like this:
213264
```yaml
214265
kind: LeaderMigrationConfiguration
215266
apiVersion: v1alpha1
216-
leaderName: cloud-controllers-migration
267+
leaderName: cloud-provider-extraction-migration
268+
resourceLock: leases
217269
controllerLeaders:
218270
- name: route-controller
219271
component: cloud-controller-manager
@@ -225,13 +277,12 @@ controllerLeaders:
225277
component: cloud-controller-manager
226278
```
227279

228-
NOTE: During upgrade, it is acceptable for control plane nodes to specify different component names for each controller as long as the `leaderName` field is the same across nodes.
280+
Please take note on how component names across both versions differs for each controller.
229281

230282
##### Disable Leader Migration
231283

232-
Once all controllers are migrated to the desired component:
233-
* disable the cloud provider in the `kube-controller-manager` (set `--cloud-provider=external`)
234-
* disable leader migration on the `kube-controller-manager` and `cloud-controller-manager` by unsetting the `--cloud-migration-config` field.
284+
Once all nodes in the control plane are upgraded to 1.22, disable leader migration on the `cloud-controller-manager` by
285+
unsetting the `--enable-migration-config` flag.
235286

236287
### Risks and Mitigations
237288

@@ -261,4 +312,7 @@ Version skew is handled as long as the leader name is consistent across all cont
261312
## Implementation History
262313

263314
- 07-25-2019 `Summary` and `Motivation` sections were merged signaling SIG acceptance
264-
- 01-21-2019 Implementation details are proposed to move KEP to `implementable` state.
315+
- 01-21-2019 Implementation details are proposed to move KEP to `implementable` state.
316+
- 09-30-2020 `LeaderMigrationConfiguration` and `ControllerLeaderConfiguration` schemas merged as #94205.
317+
- 11-04-2020 Registration of both types merged as #96133
318+
- 12-28-2020 Parsing and validation merged as #96226

0 commit comments

Comments
 (0)