Skip to content

Commit 76520bd

Browse files
Proposal for in place propagation of changes affecting Kubernetes objects only
1 parent f735b92 commit 76520bd

File tree

5 files changed

+331
-0
lines changed

5 files changed

+331
-0
lines changed

docs/book/src/reference/glossary.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,14 @@ See e.g. [CAPA][#CAPA], [CAPC][#CAPC], [CAPD][#CAPD], [CAPG][#CAPG], [CAPH][#CAP
206206

207207
A [patch](#patch) defined inline in a [ClusterClass](#clusterclass). An alternative to an [external patch](#external-patch).
208208

209+
### In-place mutable fields
210+
211+
Fields which changes would only impact Kubernetes objects or/and controller behaviour
212+
but they won't mutate in any way provider infrastructure nor the software running on it. In-place mutable fields
213+
are propagated in place by CAPI controllers to avoid the more elaborated mechanics of a replace rollout.
214+
They include metadata, MinReadySeconds, NodeDrainTimeout, NodeVolumeDetachTimeout and NodeDeletionTimeout but are
215+
not limited to be expanded in the future.
216+
209217
### Instance
210218

211219
see [Server](#server)
Lines changed: 323 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,323 @@
1+
---
2+
3+
title: In place propagation of changes affecting Kubernetes objects only
4+
authors:
5+
- "@fabriziopandini"
6+
- @sbueringer
7+
reviewers:
8+
- @oscar
9+
- @vincepri
10+
creation-date: 2022-02-10
11+
last-updated: 2022-02-26
12+
status: implementable
13+
replaces:
14+
superseded-by:
15+
16+
---
17+
18+
# In place propagation of changes affecting Kubernetes objects only
19+
20+
## Glossary
21+
22+
Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).
23+
24+
**In-place mutable fields**: fields which changes would only impact Kubernetes objects or/and controller behaviour
25+
but they won't mutate in any way provider infrastructure nor the software running on it. In-place mutable fields
26+
are propagated in place by CAPI controllers to avoid the more elaborated mechanics of a replace rollout.
27+
They include metadata, MinReadySeconds, NodeDrainTimeout, NodeVolumeDetachTimeout and NodeDeletionTimeout but are
28+
not limited to be expanded in the future.
29+
30+
## Summary
31+
32+
This document discusses how labels, annotation and other fields impacting only Kubernetes objects or controller behaviour (e.g NodeDrainTimeout)
33+
propagate from ClusterClass to KubeadmControlPlane/MachineDeployments and ultimately to Machines.
34+
35+
## Motivation
36+
37+
Managing labels on Kubernetes nodes has been a long standing [issue](https://github.com/kubernetes-sigs/cluster-api/issues/493) in Cluster API.
38+
39+
The following challenges have been identified through various iterations:
40+
41+
- Define how labels propagate from Machine to Node.
42+
- Define how labels and annotations propagate from ClusterClass to KubeadmControlPlane/MachineDeployments and ultimately to Machines.
43+
- Define how to prevent that label and annotation propagation triggers unnecessary rollouts.
44+
45+
The first point is being addressed by [Label Sync Between Machine and underlying Kubernetes Nodes](./20220927-label-sync-between-machine-and-nodes.md),
46+
while this document tackles the remaining two points.
47+
48+
During a preliminary exploration we identified that the two above challenges apply also to other fields impacting only Kubernetes objects or
49+
controller behaviour (see e.g. [Support to propagate properties in-place from MachineDeployments to Machines](https://github.com/kubernetes-sigs/cluster-api/issues/5880)).
50+
51+
As a consequence we have decided to expand this work to consider how to propagate labels, annotations and fields impacting only Kubernetes objects or
52+
controller behaviour, as well as this related issue: [Labels and annotations for MachineDeployments and KubeadmControlPlane created by topology controller](https://github.com/kubernetes-sigs/cluster-api/issues/7006).
53+
54+
### Goals
55+
56+
- Define how labels and annotations propagate from ClusterClass to KubeadmControlPlane/MachineDeployments and ultimately to Machines.
57+
- Define how fields impacting only Kubernetes objects or controller behaviour propagate from ClusterClass to KubeadmControlPlane
58+
MachineDeployments, and ultimately to Machines.
59+
- Define how to prevent that propagation of labels, annotations and other fields impacting only Kubernetes objects or controller behaviour
60+
triggers unnecessary rollouts.
61+
62+
### Non-Goals
63+
64+
- Discuss the immutability core design principle in Cluster API (on the contrary, this proposal makes immutability even better by improving
65+
the criteria on when we trigger Machine rollouts).
66+
- To support in-place mutation for components or settings that exist on Machines (this proposal focuses only on labels, annotations and other
67+
fields impacting only Kubernetes objects or controller behaviour).
68+
69+
### Future-Goals
70+
71+
- Expand propagation rules including MachinePools after the [MachinePools Machine proposal](./20220209-machinepool-machines.md) is implemented.
72+
73+
## Proposal
74+
75+
### User Stories
76+
77+
#### Story 1
78+
79+
As a cluster admin/user, I would like a declarative and secure means by which to assign roles to my nodes via Cluster topology metadata
80+
(for Clusters with ClusterClass).
81+
82+
As a cluster admin/user, I would like a declarative and secure means by which to assign roles to my nodes via KubeadmControlPlane and
83+
MachineDeployments (for Clusters without ClusterClass).
84+
85+
#### Story 2
86+
87+
As a cluster admin/user, I would like to change labels or annotations on Machines without triggering Machine rollouts.
88+
89+
#### Story 3
90+
91+
As a cluster admin/user, I would like to change nodeDrainTimeout on Machines without triggering Machine rollouts.
92+
93+
#### Story 4
94+
95+
As a cluster admin/user, I would like to set autoscaler labels for MachineDeployments by changing Cluster topology metadata
96+
(for Clusters with ClusterClass).
97+
98+
### Implementation Details/Notes/Constraints
99+
100+
### Metadata propagation
101+
102+
The following schema represent how metadata propagation works today (also documented in [book](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/metadata-propagation.html)).
103+
104+
![Figure 1](./images/in-place-propagation/current-state.png)
105+
106+
With this proposal we are suggesting to improve metadata propagation as described in the following schema:
107+
108+
![Figure 2](./images/in-place-propagation/proposed-changes.png)
109+
110+
Following paragraphs provide more details about the proposed changes.
111+
112+
#### 1. Label Sync Between Machine and underlying Kubernetes Nodes
113+
114+
As discussed in [Label Sync Between Machine and underlying Kubernetes Nodes](./20220927-label-sync-between-machine-and-nodes.md) we are propagating only
115+
labels with a well-known prefix or a well-known domain from the Machine to the corresponding Kubernetes Node.
116+
117+
#### 2. Labels/Annotations always reconciled
118+
119+
All the labels/annotations previously set only on creation are now going to be always reconciled;
120+
in order to prevent unnecessary rollouts, metadata propagation should happen in-place;
121+
see [in-place propagation](#in-place-propagation) down in this document for more details.
122+
123+
Note: As of today the topology controller already propagates ClusterClass and Cluster topology metadata changes in-place when possible
124+
in order to avoid unnecessary template rotation with the consequent Machine rollout; we do not foresee changes to this logic.
125+
126+
#### 3. and 4. Set top level labels/annotations for ControlPlane and MachineDeployment created from a ClusterClass
127+
128+
Labels and annotations from ClusterClass and Cluster.topology are going to be propagated to top-level level labels and annotations in
129+
ControlPlane and MachineDeployment.
130+
131+
This addresses [Labels and annotations for MachineDeployments and KubeadmControlPlane created by topology controller](https://github.com/kubernetes-sigs/cluster-api/issues/7006).
132+
133+
Note: The proposed solution avoids to add additional metadata fields in ClusterClass and Cluster.topology, but
134+
this has the disadvantage that it is not possible to differentiate top-level labels/annotations from Machines,
135+
but given the discussion on the above issue this isn't a requirement.
136+
137+
### Propagation of fields impacting only Kubernetes objects or controller behaviour
138+
139+
In addition to labels and annotations, there are also other fields that flow down from ClusterClass to KubeadmControlPlane/MachineDeployments and
140+
ultimately to Machines.
141+
142+
Some of them can be considered like labels and annotations, because they have impacts only on Kubernetes objects or controller behaviour, but
143+
not on the actual Machine itself - including infrastructure and the software running on it (in-place mutable fields).
144+
Examples are `MinReadySeconds`, `NodeDrainTimeout`, `NodeVolumeDetachTimeout`, `NodeDeletionTimeout`.
145+
146+
Propagation of changes to those fields will be implemented using the same [in-place propagation](#in-place-propagation) mechanism implemented
147+
for metadata.
148+
149+
### In-place propagation
150+
151+
With in-place propagation we are referring to a mechanism that updates existing Kubernetes objects, like MachineSets or Machines, instead of
152+
creating a new object with the updated fields and then deleting the current Kubernetes object.
153+
154+
The main benefit of this approach is that it prevents unnecessary rollouts of the corresponding infrastructure, with the consequent creation/
155+
deletion of a Kubernetes node and drain/scheduling of workloads hosted on the Machine being deleted.
156+
157+
**Important!** In-place propagation of changes as defined above applies only to metadata changes or to fields impacting only Kubernetes objects
158+
or controller behaviour. This approach can not be used to apply changes to the infrastructure hosting a Machine, to the OS or any software
159+
installed on it, Kubernetes components included (Kubelet, static pods, CRI etc.).
160+
161+
Implementing in-place propagation has two distinct challenges:
162+
163+
- Current rules defining when MachineDeployments or KubeadmControlPlane trigger a rollout should be modified in order to ignore metadata and
164+
other fields that are going to be propagated in-place.
165+
166+
- When implementing the reconcile loop that performs in-place propagation, it is required to avoid impact on other components applying
167+
labels or annotations to the same object. For example, when reconciling labels to a Machine, Cluster API should take care of reconciling
168+
only the labels it manages, without changing any label applied by the users/by another controller on the same Machine.
169+
170+
#### MachineDeployment rollouts
171+
172+
The MachineDeployment controller determines when a rollout is required using a "semantic equality" comparison between current MachineDeployment
173+
spec and the corresponding MachineSet spec.
174+
175+
While implementing this proposal we should change the definition of "semantic equality" in order to exclude metadata and fields that
176+
should be updated in-place.
177+
178+
On top of that we should also account for the use case where, after deploying the new "semantic equality" rule, there is already one or more
179+
MachineSet(s) matching the MachineDeployment. Today in this case Cluster API deterministically picks the oldest of them.
180+
181+
When exploring the solution for this proposal we discovered that the above approach can cause turbulence in the Cluster because it does not
182+
take into account to which MachineSets existing Machines belong. As a consequence a Cluster API upgrade could lead to a rollout with Machines moving from
183+
a "semantically equal" MachineSet to another, which is an unnecessary operation.
184+
185+
In order to prevent this we are modifying the MachineDeployment controller in order to pick the "semantically equal" MachineSet with more
186+
Machines, thus avoiding or minimizing turbulence in the Cluster.
187+
188+
##### What about the hash label
189+
190+
The MachineDeployment controller relies on a label with a hash value to identify Machines belonging to a MachineSet; also, the hash value
191+
is used as suffix for the MachineSet name.
192+
193+
Currently the hash is computed using an algorithm that considers the same set of fields used to determine "semantic equality" between current
194+
MachineDeployment spec and the corresponding MachineSet spec.
195+
196+
When exploring the solution for this proposal, we decided above algorithm can be simplified by using a simple random string
197+
plus a check that ensures that the random string is not already taken by an existing MachineSet (for this MachineDeployment).
198+
199+
The main benefit of this change is that we are going to decouple "semantic equality" from computing a UID to be used for identifying Machines
200+
belonging to a MachineSet. Thus making the code easier to understand and simplifying future changes on rollout rules.
201+
202+
#### KCP rollouts
203+
204+
The KCP controller determines when a rollout is required using a "semantic equality" comparison between current KCP
205+
object and the corresponding Machine object.
206+
207+
The "semantic equality" implementation is pretty complex, but for the sake of this proposal only a few detail are relevant:
208+
209+
- Rollout is triggered if a Machine doesn't have all the labels and the annotations in spec.machineTemplate.Metadata.
210+
- Rollout is triggered if the KubeadmConfig linked to a Machine doesn't have all the labels and the annotations in spec.machineTemplate.Metadata.
211+
212+
While implementing this proposal, above rule should be dropped, and replaced by in-place update of label & annotations.
213+
Please also note that the current rule does not detect when a label/annotation is removed from spec.machineTemplate.Metadata
214+
and thus users are required to remove labels/annotation manually; this is considered a bug and the new implementation
215+
should account for this use case.
216+
217+
Also, according to the current "semantic equality" rules, changes to nodeDrainTimeout, nodeVolumeDetachTimeout, nodeDeletionTimeout are
218+
applied only to new machines (they don't trigger rollout). While implementing this proposal, we should make sure that
219+
those changes are propagated to existing machines, without triggering rollout.
220+
221+
#### Avoiding conflicts with other components
222+
223+
While doing [in-place propagation](#in-place-propagation), and thus continuously reconciling info from a Kubernetes
224+
object to another we are also reconciling values in a map, like e.g. Labels or Annotations.
225+
226+
This creates some challenges. Assume that:
227+
228+
We want to reconcile following labels form MachineDeployment to Machine:
229+
230+
```yaml
231+
labels:
232+
a: a
233+
b: b
234+
```
235+
236+
After the first reconciliation, the Machine gets above labels.
237+
Now assume that we remove label `a` from the MachineDeployment; The expected set of labels is
238+
239+
```yaml
240+
labels:
241+
a: a
242+
```
243+
244+
But the machine still has the label `b`, but we cannot remove it, because at this stage we do not know
245+
if this label has been applied by Cluster API or by the user or another controllers.
246+
247+
In order to manage properly this use case, that is co-authored maps, the solution available in API server is
248+
to use [Server Side Apply patches](https://kubernetes.io/docs/reference/using-api/server-side-apply/).
249+
250+
Based on previous experience in introducing SSA in the topology controller this change requires a lot of testing
251+
and validation. Some cases that should be specifically verified includes:
252+
253+
- introducing SSA patches on an already existing object (and ensure that SSA takes over ownership of managed labels/annotations properly)
254+
- using SSA patches on objects after move or velero backup/restore (and ensure that SSA takes over ownership of managed labels/annotations properly)
255+
256+
However, despite those use case to be verified during implementation, it is assumed that using API server
257+
build in capabilities is a stronger, long term solution than any other alternative.
258+
259+
## Alternatives
260+
261+
### To not use SSA for [in-place propagation](#in-place-propagation) and be authoritative on labels and annotations
262+
263+
If Cluster API uses regular patches instead of SSA patches, a a well tested path in Cluster API, Cluster API can
264+
be implemented in order to be authoritative on label and annotations, that means that all the labels and annotations should
265+
be propagated from higher level objects (e.g. all the Machine's labels should be set on the MachineSet, and going
266+
on up the propagation chain).
267+
268+
This is not considered acceptable, because users and other controller must be capable to apply their own
269+
labels to any Kubernetes object, included the ones managed by Cluster API.
270+
271+
### To not use SSA for [in-place propagation](#in-place-propagation) and do not delete labels/annotations
272+
273+
If Cluster API uses regular patches instead of SSA patches, but without being authoritative, Cluster API can
274+
be implemented in order to add new labels from higher level objects (e.g. a new label added to MachineSet is added to
275+
the corresponding Machine) and to enforce labels values from higher level objects.
276+
277+
But, as explained in [avoiding conflicts with other components](#avoiding-conflicts-with-other-components), using
278+
this approach there is no way to determine if label/annotation has been applied by Cluster API or by the user or another controllers,
279+
and thus automatic label/annotation deletion cannot be implemented.
280+
281+
This approach is not considered ideal, because it is transferring the ownership of labels and annotations deletion
282+
to users or other controllers, and this is not considered a nice user experience.
283+
284+
### To not use SSA for [in-place propagation](#in-place-propagation) and use status fields to track labels previously applied by CAPI
285+
286+
If Cluster API uses regular patches instead of SSA patches, without being authoritative, it is possible to implement
287+
a DIY solution for tracking label ownership based on status fields or annotations.
288+
289+
This approach is not considered ideal, because e.g. status field do not survive move/backup and restore, and tacking
290+
a step back, this is sort of re-implementing SSA or a subset of it.
291+
292+
### Change more propagation rules
293+
294+
While working on the set of changes proposed above a set of optional changes to the existing propagation rules have been
295+
identified; however, considering that the more complex part of this proposal is implementing [in-place propagation](#in-place-propagation),
296+
it was decided to implement only the few, most critical changes to propagation rules.
297+
298+
Nevertheless we are documenting optional changes dropped from the scope of this iteration for future reference.
299+
300+
![Figure 3](./images/in-place-propagation/optional-changes.png)
301+
302+
Optional changes:
303+
304+
- 4b: Simplify MachineDeployment to MachineSet label propagation
305+
Leveraging on changed introduced 4, it is possible to simplify MachineDeployment to MachineSet label propagation,
306+
which currently mimics Deployment to ReplicaSet label propagation. The backside of this chance is that it wouldn't be
307+
possible anymore to have different labels/annotations on MachineDeployment & MachineSet.
308+
309+
- 5a and 5b: Propagate ClusterClass and Cluster.topology to templates
310+
This changes make ClusterClass and Cluster.topology labels/annotation to be propagated to templates as well.
311+
Please note that this change requires further discussions, because
312+
- Contract with providers should be extended to add optional metadata fields where necessary
313+
- It should be defined how to detect if a template for a specific provider has the optional metadata fields,
314+
and this is tricky because Cluster API doesn't have detailed knowledge of provider's types.
315+
- InfrastructureMachineTemplates are immutable in a lot of providers, so we have to discuss how/if we should
316+
be able to mutate the InfrastructureMachineTemplates.spec.template.metadata.
317+
318+
### Change more propagation rules
319+
320+
321+
## Implementation History
322+
323+
- [ ] 10/03/2022: First Draft of this document
193 KB
Loading
234 KB
Loading
216 KB
Loading

0 commit comments

Comments
 (0)