Skip to content

Commit 6180abe

Browse files
Chained and efficient upgrades for Clusters with managed topologies
1 parent 0408c84 commit 6180abe

11 files changed

+624
-1203
lines changed
90 KB
Loading

docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md

Lines changed: 189 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ The lifecycle hooks allow hooking into the Cluster lifecycle. The following diag
1414

1515
![Lifecycle Hooks overview](../../../images/runtime-sdk-lifecycle-hooks.png)
1616

17-
Please see the corresponding [CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md)
17+
Please see the corresponding [proposal: Runtime hooks for Add-on Management (lifecycle hooks)](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-lifecycle-hooks.md) and
18+
also [proposal: Chained and efficient upgrades for Clusters with managed topologies](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md)]
1819
for additional background information.
1920

2021
## Guidelines
@@ -139,8 +140,17 @@ cluster:
139140
...
140141
status:
141142
...
142-
fromKubernetesVersion: "v1.21.2"
143-
toKubernetesVersion: "v1.22.0"
143+
fromKubernetesVersion: "v1.30.0"
144+
toKubernetesVersion: "v1.33.0"
145+
upgradePlan:
146+
controlPlane:
147+
- v1.30.0
148+
- v1.31.0
149+
- v1.32.3
150+
- v1.33.0
151+
workers:
152+
- v1.32.3
153+
- v1.33.0
144154
```
145155

146156
#### Example Response:
@@ -159,12 +169,68 @@ For additional details, you can see the full schema in <button onclick="openSwag
159169
if previous upgrades or worker machine rollouts are still in progress, the system waits for those operations
160170
to complete before starting the new upgrade.
161171

172+
### BeforeControlPlaneUpgrade
173+
174+
This hook is called before a new version is propagated to the control plane object. Runtime Extension implementers
175+
can use this hook to execute pre-upgrade add-on tasks and block upgrades of the ControlPlane.
176+
177+
Note:
178+
- When an upgrade is starting, BeforeControlPlaneUpgrade will be called after BeforeClusterUpgrade is completed.
179+
- When an upgrade is in progress BeforeControlPlaneUpgrade will be called for each intermediate version that will
180+
be applied to the control plane (instead BeforeClusterUpgrade will be called only once at the beginning of the upgrade).
181+
182+
#### Example Request:
183+
184+
```yaml
185+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
186+
kind: BeforeControlPlaneUpgradeRequest
187+
settings: <Runtime Extension settings>
188+
cluster:
189+
apiVersion: cluster.x-k8s.io/v1beta1
190+
kind: Cluster
191+
metadata:
192+
name: test-cluster
193+
namespace: test-ns
194+
spec:
195+
...
196+
status:
197+
...
198+
fromKubernetesVersion: "v1.30.0"
199+
toKubernetesVersion: "v1.33.0"
200+
upgradePlan:
201+
controlPlane:
202+
- v1.30.0
203+
- v1.31.0
204+
- v1.32.3
205+
- v1.33.0
206+
workers:
207+
- v1.32.3
208+
- v1.33.0
209+
```
210+
211+
Note: The upgrade plan in the request contains only missing steps to reach the target version.
212+
213+
#### Example Response:
214+
215+
```yaml
216+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
217+
kind: BeforeControlPlaneUpgradeResponse
218+
status: Success # or Failure
219+
message: "error message if status == Failure"
220+
retryAfterSeconds: 10
221+
```
222+
223+
For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
224+
162225
### AfterControlPlaneUpgrade
163226

164-
This hook is called after the entire control plane has been upgraded to the version specified in `spec.topology.version`,
165-
and immediately before the new version is going to be propagated to the MachineDeployments of the Cluster.
166-
Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks and block upgrades to workers
167-
until everything is ready.
227+
This hook is called after the entire control plane has been upgraded to the version specified in `spec.topology.version`
228+
or to an intermediate version in the upgrade plan and:
229+
- if workers upgrade can be skipped for this version, immediately before the next intermediate version is applied to the control plane
230+
- if workers upgrade must be performed for this version, immediately before the new version is going to be propagated to the MachineDeployments of the Cluster.
231+
232+
Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks and block upgrades to the next
233+
version of the control plane or to workers until everything is ready.
168234

169235
Note: While the MachineDeployments upgrade is blocked changes made to existing MachineDeployments and creating new MachineDeployments
170236
will be delayed while the object is waiting for upgrade. Example: modifying MachineDeployments (think scale up),
@@ -188,9 +254,122 @@ cluster:
188254
...
189255
status:
190256
...
191-
kubernetesVersion: "v1.22.0"
257+
kubernetesVersion: "v1.30.0"
258+
upgradePlan:
259+
controlPlane:
260+
- v1.31.0
261+
- v1.32.3
262+
- v1.33.0
263+
workers:
264+
- v1.32.3
265+
- v1.33.0
266+
```
267+
268+
Note: The upgrade plan in the request contains only missing steps to reach the target version, if any.
269+
270+
#### Example Response:
271+
272+
```yaml
273+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
274+
kind: AfterControlPlaneUpgradeResponse
275+
status: Success # or Failure
276+
message: "error message if status == Failure"
277+
retryAfterSeconds: 10
278+
```
279+
280+
For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
281+
282+
### BeforeWorkersUpgrade
283+
284+
This hook is called before a new version is propagated to workers. Runtime Extension implementers
285+
can use this hook to execute pre-upgrade add-on tasks and block upgrades of Workers.
286+
287+
Note:
288+
- This hook will be called only if workers upgrade must be performed for an intermediate version of of a chained upgrade
289+
or when upgrading to the target `spec.topology.version`.
290+
291+
#### Example Request:
292+
293+
```yaml
294+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
295+
kind: BeforeWorkersUpgradeRequest
296+
settings: <Runtime Extension settings>
297+
cluster:
298+
apiVersion: cluster.x-k8s.io/v1beta1
299+
kind: Cluster
300+
metadata:
301+
name: test-cluster
302+
namespace: test-ns
303+
spec:
304+
...
305+
status:
306+
...
307+
fromKubernetesVersion: "v1.30.0"
308+
toKubernetesVersion: "v1.33.0"
309+
upgradePlan:
310+
controlPlane:
311+
- v1.30.0
312+
- v1.31.0
313+
- v1.32.3
314+
- v1.33.0
315+
workers:
316+
- v1.32.3
317+
- v1.33.0
192318
```
193319

320+
Note: The upgrade plan in the request contains only missing steps to reach the target version.
321+
322+
#### Example Response:
323+
324+
```yaml
325+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
326+
kind: BeforeControlPlaneUpgradeResponse
327+
status: Success # or Failure
328+
message: "error message if status == Failure"
329+
retryAfterSeconds: 10
330+
```
331+
332+
For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
333+
334+
### AfterWorkersUpgrade
335+
336+
This hook is called after all the workers have been upgraded to the version specified in `spec.topology.version`
337+
or to an intermediate version in the upgrade plan, and:
338+
- if the upgrade plan is completed and the entire cluster is at `spec.topology.version`, immediately before calling the AfterClusterUpgrade hook
339+
- if the upgrade plan is not complete and the entrire cluster is now at one of the intermediate versions, immediately before calling BeforeControlPlaneUpgrade hook for the next intermediate step
340+
341+
Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks; if the upgrade plan is not completed,
342+
this hook allows to block upgrades to the next version of the control plane until everything is ready.
343+
344+
#### Example Request:
345+
346+
```yaml
347+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
348+
kind: AfterWorkersRequest
349+
settings: <Runtime Extension settings>
350+
cluster:
351+
apiVersion: cluster.x-k8s.io/v1beta1
352+
kind: Cluster
353+
metadata:
354+
name: test-cluster
355+
namespace: test-ns
356+
spec:
357+
...
358+
status:
359+
...
360+
kubernetesVersion: "v1.30.0"
361+
upgradePlan:
362+
controlPlane:
363+
- v1.31.0
364+
- v1.32.3
365+
- v1.33.0
366+
workers:
367+
- v1.32.3
368+
- v1.33.0
369+
```
370+
371+
Note: The upgrade plan in the request contains only missing steps to reach the target version, if any.
372+
194373
#### Example Response:
195374

196375
```yaml
@@ -201,6 +380,8 @@ message: "error message if status == Failure"
201380
retryAfterSeconds: 10
202381
```
203382

383+
Note: retryAfterSeconds is ignored when workers version is equal to `spec.topology.version`.
384+
204385
For additional details, you can see the full schema in <button onclick="openSwaggerUI()">Swagger UI</button>.
205386

206387
### AfterClusterUpgrade
@@ -237,8 +418,6 @@ status: Success # or Failure
237418
message: "error message if status == Failure"
238419
```
239420

240-
For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml).
241-
242421
### BeforeClusterDelete
243422

244423
This hook is called after the Cluster deletion has been triggered by the user and immediately before the topology

docs/book/src/tasks/experimental-features/runtime-sdk/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Additional documentation:
2626
* Background information:
2727
* [Runtime SDK CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220221-runtime-SDK.md)
2828
* [Topology Mutation Hook CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220330-topology-mutation-hook.md)
29-
* [Runtime Hooks for Add-on Management CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md)
29+
* [Runtime Hooks for Add-on Management CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-lifecycle-hooks.md)
3030
* For Runtime Extension developers:
3131
* [Implementing Runtime Extensions](./implement-extensions.md)
3232
* [Implementing Lifecycle Hook Extensions](./implement-lifecycle-hooks.md)

docs/proposals/20210526-cluster-class-and-managed-topologies.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,8 @@ at high level the new CRD contains:
209209
- A list of patches, allowing to change above templates for each specific Cluster.
210210
- A list of variable definitions, defining a set of additional values the users can provide on each specific cluster;
211211
those values can be used in patches.
212+
- A list of Kubernetes versions to be used when performing chained upgrades for clusters using this Cluster class, see
213+
[proposal: Chained and efficient upgrades for Clusters with managed topologies](20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md).
212214

213215
The following paragraph provides some additional context on some of the above values; more info can
214216
be found in [writing a ClusterClass](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html).
@@ -378,8 +380,19 @@ as well as in
378380
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
379381
kind: VSphereClusterTemplate
380382
name: vsphere-prod-cluster-template
383+
upgrade:
384+
versions:
385+
- v1.28.0
386+
- v1.29.0
387+
- v1.30.0
388+
- v1.30.1
389+
- v1.31.2
390+
- ...
381391
```
382392
393+
see [proposal: Chained and efficient upgrades for Clusters with managed topologies](20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md) for more options
394+
for configuring Kubernetes version upgrade of clusters using managed topologies.
395+
383396
2. User creates a cluster using the class name and defining the topology.
384397
```yaml
385398
apiVersion: cluster.x-k8s.io/v1beta1
@@ -433,6 +446,9 @@ This section talks about updating a Cluster which was created using a `ClusterCl
433446

434447
![Update cluster with ClusterClass](./images/cluster-class/update.png)
435448

449+
see [proposal: Chained and efficient upgrades for Clusters with managed topologies](20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md)
450+
for more considerations about Kubernetes version upgrade of clusters using managed topologies.
451+
436452
#### Behavior with patches
437453

438454
This section highlights how the basic behavior discussed above changes when patches are used. This is an important use case because without
@@ -485,6 +501,9 @@ like e.g. a different HTTP proxy configuration, a different image to be used for
485501
valueFrom:
486502
variable: machineType
487503
```
504+
505+
See [proposal: topology mutation hook](20220330-topology-mutation-hook.md) for a powerful alternative to
506+
inline patches.
488507

489508
##### Create a new Cluster with patches
490509

@@ -613,6 +632,7 @@ The initial plan is to rollout Cluster Class and support for managed topologies
613632
- 10/04/2021: Added support for patches and variables
614633
- 01/10/2022: Added support for MachineHealthChecks
615634
- 12/20/2022: Cleaned up outdated implementation details by linking the book's pages instead. This will make it easier to keep the proposal up to date.
635+
- 05/13/2025: Added support for Upgrade Plans; see [proposal: Chained and efficient upgrades for Clusters with managed topologies](20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md)
616636

617637
<!-- Links -->
618638
[community meeting]: https://docs.google.com/document/d/1Ys-DOR5UsgbMEeciuG0HOgDQc8kZsaWIWJeKJ1-UfbY

docs/proposals/20220414-runtime-hooks.md renamed to docs/proposals/20220414-lifecycle-hooks.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,15 +115,18 @@ As a developer of an add-ons orchestration solution:
115115
* **Before a Cluster is Created** I want to automatically check if enough disk space is available for allocation to the cluster for persistent storage of collected metrics values.
116116
* **After the Control Plane** **is Initialized** I want to automatically install a metrics database and associated add-ons in the workload cluster.
117117
* **Before the Cluster is Upgraded** I want to install a new version of the metrics database with a new version of the custom metrics apiservice to interact directly with the Kubernetes apiserver.
118-
* **After the ControlPlane is Upgraded** I want to automatically check that the new version of the custom metrics apiservice is working and correctly fulfilled by my metrics database.
118+
* **Before the ControlPlane is Upgraded** I want to install a new version of the metrics database with a new version of the custom metrics apiservice to interact directly with the Kubernetes apiserver.
119+
* **After the ControlPlane is Upgraded** I want to install new versions of metrics collectors to each upgraded node in the cluster.
120+
* **Before workers are Upgraded** I want to install a new version of the metrics database with a new version of the custom metrics apiservice to interact directly with the Kubernetes apiserver.
121+
* **After workers are Upgraded** I want to install new versions of metrics collectors to each upgraded node in the cluster
119122
* **After the Cluster is Upgraded** I want to install new versions of metrics collectors to each upgraded node in the cluster.
120123
* **Before the Cluster is Deleted** I want to automatically back up persistent volumes used by the metrics database.
121124

122125
### Runtime hook definitions
123126

124127
Below is a description for the Runtime Hooks introduced by this proposal.
125128

126-
![runtime-hooks](images/runtime-hooks/runtime-hooks.png)
129+
![runtime-hooks](images/lifecycle-hooks/lifecycle-hooks.png)
127130

128131
The remainder of this section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md#definitions)
129132
to avoid duplication.
@@ -210,6 +213,7 @@ See [upgrade strategy](#upgrade-strategy).
210213
* [x] 2022-04-04: Opened corresponding [issue](https://github.com/kubernetes-sigs/cluster-api/issues/6374)
211214
* [x] 2022-04-06: Presented proposal at a [community meeting]
212215
* [x] 2022-04-14: Opened proposal PR
216+
* [x] 2025-05-13: Added runtime hooks for chained upgrades; see [proposal: Chained and efficient upgrades for Clusters with managed topologies](20250513-chained-and-efficient-upgrades-for-clusters-with-managed-topologies.md)
213217

214218
<!-- Links -->
215219
[community meeting]: https://docs.google.com/document/d/1ushaVqAKYnZ2VN_aa3GyKlS4kEd6bSug13xaXOakAQI/edit#heading=h.pxsq37pzkbdq

0 commit comments

Comments
 (0)