Skip to content

Commit b1cb976

Browse files
Merge pull request #311 from sbueringer/pr-chained-upgrade-proposal-review
2 parents 614643d + 554f310 commit b1cb976

File tree

4 files changed

+26
-32
lines changed

4 files changed

+26
-32
lines changed

docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Following recommendations are especially relevant:
3535

3636
## Definitions
3737

38-
For additional details fo additional details about the OpenAPI spec of the lifecycle hooks, please download the [`runtime-sdk-openapi.yaml`]({{#releaselink repo:"https://github.com/kubernetes-sigs/cluster-api" gomodule:"sigs.k8s.io/cluster-api" asset:"runtime-sdk-openapi.yaml" version:"1.11.x"}})
38+
For additional details about the OpenAPI spec of the lifecycle hooks, please download the [`runtime-sdk-openapi.yaml`]({{#releaselink repo:"https://github.com/kubernetes-sigs/cluster-api" gomodule:"sigs.k8s.io/cluster-api" asset:"runtime-sdk-openapi.yaml" version:"1.11.x"}})
3939
file and then open it from the [Swagger UI](https://editor.swagger.io/).
4040

4141
### BeforeClusterCreate

docs/proposals/20210526-cluster-class-and-managed-topologies.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -600,7 +600,7 @@ type definitions, like +MapType or +MapTypeKey, see [merge strategy](https://kub
600600

601601
Note: in order to allow the topology controller to execute templates rotation only when strictly necessary, it is necessary
602602
to implement specific handling of dry run operations in the templates webhooks as described in the Cluster API contract, see
603-
e.g. [InfraMachineTemplate: support for SSA dry runInfraMachineTemplate: support for SSA dry run](https://cluster-api.sigs.k8s.io/developer/providers/contracts/infra-machine#inframachinetemplate-support-for-ssa-dry-run).
603+
e.g. [InfraMachineTemplate: support for SSA dry run](https://cluster-api.sigs.k8s.io/developer/providers/contracts/infra-machine#inframachinetemplate-support-for-ssa-dry-run).
604604

605605
### Risks and Mitigations
606606

docs/proposals/20220414-lifecycle-hooks.md

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ Below is a description for the Runtime Hooks introduced by this proposal.
131131
The remainder of this section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md#definitions)
132132
to avoid duplication.
133133

134-
Note: Following change will be applied to the hooks with the ongoing work for [Chained and efficient upgrades](); the
134+
Note: Following change will be applied to the hooks with the ongoing work for [Chained and efficient upgrades](./20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md); the
135135
documentation in the book will be aligned as soon as the work completes:
136136

137137
#### BeforeClusterUpgrade (modified)
@@ -217,8 +217,6 @@ message: "error message if status == Failure"
217217
retryAfterSeconds: 10
218218
```
219219
220-
For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
221-
222220
#### AfterControlPlaneUpgrade (modified)
223221
224222
This hook is called after the control plane has been upgraded to the version specified in `spec.topology.version`
@@ -274,15 +272,13 @@ message: "error message if status == Failure"
274272
retryAfterSeconds: 10
275273
```
276274

277-
For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
278-
279275
#### BeforeWorkersUpgrade (new hook)
280276

281277
This hook is called before a new version is propagated to workers. Runtime Extension implementers
282278
can use this hook to execute pre-upgrade add-on tasks and block upgrades of Workers.
283279

284280
Note:
285-
- This hook will be called only if workers upgrade must be performed for an intermediate version of of a chained upgrade
281+
- This hook will be called only if workers upgrade must be performed for an intermediate version of a chained upgrade
286282
or when upgrading to the target `spec.topology.version`.
287283

288284
##### Example Request:
@@ -320,20 +316,18 @@ Note: The upgrade plan in the request contains only missing steps to reach the t
320316

321317
```yaml
322318
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
323-
kind: BeforeControlPlaneUpgradeResponse
319+
kind: BeforeWorkersUpgradeResponse
324320
status: Success # or Failure
325321
message: "error message if status == Failure"
326322
retryAfterSeconds: 10
327323
```
328324

329-
For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
330-
331325
#### AfterWorkersUpgrade (new hook)
332326

333327
This hook is called after all the workers have been upgraded to the version specified in `spec.topology.version`
334328
or to an intermediate version in the upgrade plan, and:
335329
- if the upgrade plan is completed and the entire cluster is at `spec.topology.version`, immediately before calling the AfterClusterUpgrade hook
336-
- if the upgrade plan is not complete and the entrire cluster is now at one of the intermediate versions, immediately before calling BeforeControlPlaneUpgrade hook for the next intermediate step
330+
- if the upgrade plan is not complete and the entire cluster is now at one of the intermediate versions, immediately before calling BeforeControlPlaneUpgrade hook for the next intermediate step
337331

338332
Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks; if the upgrade plan is not completed,
339333
this hook allows to block upgrades to the next version of the control plane until everything is ready.
@@ -342,7 +336,7 @@ this hook allows to block upgrades to the next version of the control plane unti
342336

343337
```yaml
344338
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
345-
kind: AfterWorkersRequest
339+
kind: AfterWorkersUpgradeRequest
346340
settings: <Runtime Extension settings>
347341
cluster:
348342
apiVersion: cluster.x-k8s.io/v1beta1
@@ -371,7 +365,7 @@ Note: The upgrade plan in the request contains only missing steps to reach the t
371365

372366
```yaml
373367
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
374-
kind: AfterControlPlaneUpgradeResponse
368+
kind: AfterWorkersUpgradeResponse
375369
status: Success # or Failure
376370
message: "error message if status == Failure"
377371
retryAfterSeconds: 10

docs/proposals/20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ see-also:
5151
-> v1.34.0 (target version)
5252

5353
- **Upgrade plan**: the sequence of intermediate versions ... target version that a Cluster must upgrade to when
54-
performing a chained upgrade;
54+
performing a chained upgrade.
5555

5656
- **Efficient upgrade**: a chained upgrade where worker nodes skip some of the intermediate versions,
5757
when allowed by the [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/) when the chained upgrade is also an efficient upgrade,
@@ -81,7 +81,7 @@ by more than one minor Kubernetes version by performing chained and efficient up
8181
When using clusters with managed topologies:
8282
- Allow Cluster API users to perform chained upgrades.
8383
- Automatically perform chained upgrades in an efficient way by skipping workers upgrades whenever possible.
84-
- Allow Cluster API users to influence the upgrade plan e.g. availability of machines images for the intermediate versions.
84+
- Allow Cluster API users to influence the upgrade plan e.g. based on availability of machines images for the intermediate versions.
8585

8686
### Future Work
8787

@@ -96,15 +96,15 @@ When using clusters with managed topologies:
9696
### User Stories
9797

9898
- As a user, I want to upgrade my Cluster using a managed topology by more than one minor version by simply changing
99-
the value in `cluster.spec.topology.version`.
99+
the value in `Cluster.spec.topology.version`.
100100

101-
- As a user, I want that Cluster API automatically minimize the number of worker's machines rollouts
101+
- As a user, I want that Cluster API automatically minimizes the number of worker machine rollouts
102102
when upgrading a Cluster using managed topology by more than one minor.
103103

104104
- As a cluster class author, I want to be able to specify the Kubernetes versions that the system might use as
105-
intermediate or target versions for a chained upgrades for a Cluster using a specific cluster class.
105+
intermediate or target versions for a chained upgrade for a Cluster using a specific cluster class.
106106

107-
- As a developer building on top of Cluster API, I want that lifecycle hooks allow orchestration of external process,
107+
- As a developer building on top of Cluster API, I want that lifecycle hooks allow orchestration of external processes,
108108
like e.g. addon management, during different steps of a chained upgrade.
109109

110110
### Implementation Details/Notes/Constraints
@@ -113,7 +113,7 @@ This proposal is composed of three sets of changes:
113113

114114
- Improvements required to determine the upgrade plan for a chained upgrade.
115115
- Improvements required to perform chained and efficient upgrades.
116-
- Improvements to upgrade related Lifecycle hooks.
116+
- Improvements to upgrade-related Lifecycle hooks.
117117

118118
#### Upgrade plan
119119

@@ -124,7 +124,7 @@ The ClusterClass CR will be extended to make it possible to define the list of K
124124
used for chained upgrades of the corresponding clusters.
125125

126126
```yaml
127-
apiVersion: cluster.x-k8s.io/v1beta1
127+
apiVersion: cluster.x-k8s.io/v1beta2
128128
kind: ClusterClass
129129
metadata:
130130
name: quick-start-runtimesdk
@@ -144,17 +144,17 @@ between vA and vB.
144144
In the example above, the upgrade plan from v1.28.0 - current version - to v1.31.2 - target version -, will be:
145145
v1.29.0 -> v1.30.1 -> v1.31.2
146146
147-
Note: by convention, the current version is omitted from the upgrade plan, the target version is included.
147+
Note: By convention, the current version is omitted from the upgrade plan, the target version is included.
148148
149149
Note: Cluster API cannot determine the list of available Kubernetes versions automatically, because the versions that can be used
150-
in a Cluster API management cluster depend on external factors, e.g., by the availability of machine images for a Kubernetes version.
150+
in a Cluster API management cluster depend on external factors, e.g., on the availability of machine images for a Kubernetes version.
151151
152-
As an alternative to explicitly setting the list of versions in a ClusterClasses, it will all also be possible to define
152+
As an alternative to explicitly setting the list of versions in ClusterClasses, it will all also be possible to define
153153
a runtime extension to be called when computing an upgrade plan; this extension could be used to return a
154154
dynamically computed list of Kubernetes versions that can be used.
155155
156156
```yaml
157-
apiVersion: cluster.x-k8s.io/v1beta1
157+
apiVersion: cluster.x-k8s.io/v1beta2
158158
kind: ClusterClass
159159
metadata:
160160
name: quick-start-runtimesdk
@@ -345,17 +345,17 @@ This proposal does not add additional security concern to Cluster API.
345345

346346
- Upgrading a Cluster by multiple Kubernetes minor versions in a short timeframe might increase risks to face issues during the upgrade.
347347

348-
This proposal aims to help users those risks by automating the chained upgrade workflow so users can catch up with
348+
This proposal aims to help users to manage these risks by automating the chained upgrade workflow so users can catch up with
349349
Kubernetes versions easily, quickly, and with an upgrade plan validated by the system.
350350

351351
Also, worth to notice that each machine rollout in Cluster API ultimately is an operation that is exercising
352-
the same machinery that will be used during upgrades.
352+
the same machinery that will be used during upgrades.
353353

354354
That means that by doing any rollout, e.g. due to an automatic machine remediation, you get a proxy signal about the
355355
fact that the system can successfully perform an upgrade, or you get the chance to detect and fix issues in the system
356356
before a full upgrade is performed.
357357

358-
Conversely, risk increase for users not performing any form of rollouts for long periods.
358+
Conversely, risk increases for users not performing any form of rollouts for long periods.
359359

360360
- Upgrading a Cluster by multiple Kubernetes minor versions might compromise workloads.
361361

@@ -372,7 +372,7 @@ was considered.
372372

373373
However, the option was discarded because it seems more consistent having the list of
374374
Kubernetes version to be used for upgrade plans in ClusterClasses, alongside all the other info defining
375-
how a managed topology should behave.
375+
how a managed topology should behave.
376376

377377
## Upgrade Strategy
378378

@@ -393,12 +393,12 @@ required to get full coverage of the possible chained upgrade sequences.
393393
While implementing all those new tests is not impossible, it is considered not practical because the resulting E2E
394394
job would take a long time while current E2E jobs allow a fast iterative development process.
395395

396-
Accordingly, int the first iteration only one chained upgrade test scenario going from N-3 to N+1 will be validated,
396+
Accordingly, in the first iteration only one chained upgrade test scenario going from N-3 to N+1 will be validated,
397397
but this is considered enough to ensure that:
398398
- The mechanics for chained upgrade works
399399
- [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy/) is respected, and workers upgrade are performed only when necessary
400400
- Lifecycle hooks are called
401-
- Resulting K8s cluster pass the conformance test
401+
- Resulting K8s cluster passes the conformance test
402402

403403
This new test will run periodically, and also be available to be run on demand on PRs.
404404

0 commit comments

Comments
 (0)