You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -72,13 +71,12 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
72
71
73
72
## Summary
74
73
75
-
This kep introduces an ordered deletion process in the Kubernetes namespace deletion
76
-
to ensure predictable and secure deletion of resources within a namespace.
74
+
This kep introduces an opinionated deletion process in the Kubernetes namespace deletion
75
+
to ensure secure deletion of resources within a namespace.
77
76
The current deletion process is semi-random, which may lead to security gaps or
78
77
unintended behavior, such as Pods persisting after the deletion of their associated NetworkPolicies.
79
-
By implementing a prioritized deletion mechanism, resources will be deleted in a
80
-
predefined order that respects logical and security dependencies.
81
-
For example, Pods will be deleted before NetworkPolicies, ensuring that no Pod is left unprotected during the cleanup process.
78
+
By implementing an opinionated deletion mechanism, the Pods will be deleted before other resources with
79
+
respects logical and security dependencies.
82
80
This design enhances the security and reliability of Kubernetes by mitigating risks arising from the non-deterministic deletion order.
83
81
84
82
@@ -94,11 +92,10 @@ Additionally, the lack of a defined deletion order can lead to operational incon
94
92
where any sort of safety guard resources (not just NetworkPolicy) are deleted before their guarded resources (e.g., Pods),
95
93
resulting in unnecessary disruptions or errors.
96
94
97
-
By introducing a prioritized deletion process, this proposal aims to:
95
+
By introducing an opinionated deletion process, this proposal aims to:
98
96
99
97
- Enhance Security: Ensure resources like NetworkPolicies remain in effect until all dependent resources have been safely terminated.
100
98
- Increase Predictability: Provide a consistent and logical cleanup process for namespace deletion, reducing unintended side effects.
101
-
- Improve User Experience: Allow cluster administrators and developers to rely on a robust deletion mechanism without requiring manual intervention or custom scripts.
102
99
103
100
This opinionated deletion approach aligns with Kubernetes' principles of reliability, security, and extensibility,
104
101
providing a solid foundation for managing resource cleanup in complex environments.
@@ -136,49 +133,12 @@ providing a solid foundation for managing resource cleanup in complex environmen
136
133
137
134
## Proposal
138
135
139
-
Introduce a way to specify priority based on resource type while deleting namespace. When the feature gate `OrderedNamespaceDeletion` is enabled,
140
-
the resources associated with this namespace should be deleted in order.
141
-
142
-
To specify the deletion order, the options would be:
143
-
144
-
Option 1: have int value assigned for resources to indicate the deletion priority such like:
0 would be the default DeletionPriority if not specified. And the resources which have deletion ordering concern would make sure to set DeletionPriority ahead of the resource it guards.
153
-
154
-
Option 2: have the deletion priority bands defined instead of using numbers.
155
-
156
-
To begin with, the deletion order bands would be introduced:
157
-
- Workload Controllers
158
-
- Workloads
159
-
- Default
160
-
- Policies
161
-
162
-
Those deletion order bands will be deleted in sequence. E.g.
After the deletion priority set, the namespace deletion process will honor the priority and delete the resources in sequence.
172
-
The resources with higher priority should always be deleted before the resources with lower priority.
173
-
174
-
<References> Looks like we have examples for using int value for priority configurations existing in Kubernetes already.
175
-
176
-
1.[Aggregator API priority]https://github.com/kubernetes/kubernetes/blob/659c437b267c4535d2855beee8abe5c121d58569/cmd/kube-apiserver/app/aggregator.go#L27-L59
177
-
178
-
2.[API priority and fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/#flowschema)
When the feature gate `OrderedNamespaceDeletion` is enabled,
137
+
the resources associated with this namespace should be deleted in order:
181
138
139
+
- Delete all pods in the namespace (in an undefined order).
140
+
- Wait for all the pods to be stopped or deleted.
141
+
- Delete all the other resources in the namespace (in an undefined order).
182
142
183
143
### User Stories (Optional)
184
144
@@ -188,17 +148,25 @@ A user has pods which listen on the network and network policies which help prot
188
148
While namespace deletion, there could be cases that NetworkPolicy has deleted while the pods are running
189
149
which cause the security concern of having Pods running unprotected.
190
150
191
-
After this feature was introduced, we would have NetworkPolicy in lower DeletionOrderPriority than the Pods to
151
+
After this feature was introduced, we would have NetworkPolicy always deleted after the Pods to
192
152
avoid the above security concern.
193
153
194
154
#### Story 2 - having finalizer conflicts with deletion order
195
155
196
-
E.g. the NetworkPolicy deletion order would be set lower than the pod and we could expect the Pod be deleted
197
-
before NetworkPolicy. However, if even one pod has a finalizer which is waiting for network policies (which is opaque to Kubernetes),
156
+
E.g. if the pod has a finalizer which is waiting for network policies (which is opaque to Kubernetes),
198
157
it will cause dependency loops and block the deletion process.
199
158
200
159
Refer to the section `Handling Cyclic Dependencies`.
201
160
161
+
#### Story 3 - having policy set up with parameter resources
162
+
163
+
When ValidatingAdmissionPolicy is used in the cluster with parameterization, it is possible to use pod as the parameter resources. In this case, the parameter resources will be deleted before VAP and
164
+
lead the VAP not in use. To make it even worse, if the ValidatingAdmissionPolicyBinding is configured with `.spec.paramRef.parameterNotFoundAction: Deny`,
165
+
it could block certain resources operations and also hang the termination process. Similar concern applies to Webhooks with parameter resources.
166
+
167
+
It is an existing issue with current namespace deletion as well. As long as we don't plan to have a dependency graph built, it will rely more on
168
+
best practices and user's configuration.
169
+
202
170
### Notes/Constraints/Caveats (Optional)
203
171
204
172
#### Having ownerReference conflicts with deletion order
@@ -209,8 +177,7 @@ Namespace deletion specifically uses `metav1.DeletePropagationBackground` and al
209
177
dependencies would be handled by the garbage collection.
210
178
211
179
In Kubernetes, `ownerReferences` define a parent-child relationship where child resources are automatically deleted when the parent is removed.
212
-
This is mostly handled by garbage collection. While namespace deletion, the `ownerReferences` is not part of the consideration and
213
-
`NamespaceDeletionOrder` group will be honored while deleting resources as what it is currently. The garbage collector controller will make sure
180
+
This is mostly handled by garbage collection. While namespace deletion, the `ownerReferences` is not part of the consideration and the garbage collector controller will make sure
214
181
no child resources still existing after the parent resource deleted.
215
182
216
183
@@ -221,113 +188,35 @@ no child resources still existing after the parent resource deleted.
221
188
The introduction of deletion order could potentially cause dependency loops especially when finalizers are
222
189
specified against deletion priority.
223
190
224
-
When a lack of progress detected(maybe caused by the dependency cycle described above), the options would be:
225
-
- Fall back to the previous behavior.
226
-
- Pros: The deletion would not be blocked; no breaking changes;
227
-
- Cons: Security concern remains unaddressed or could be bypassed
228
-
229
-
- Return error after retry.
230
-
- Pros: Make sure the security concern being addressed by always honor the deletion order
231
-
- Cons: Block namespace deletion if dependency cycle exists.
232
-
233
-
Mitigation: A proper fallback mechanism would be introduced to make sure the namespace deletion process would not be
234
-
hanging forever because of potential dependency cycle.
235
-
236
-
Refer to the section `Handling Cyclic Dependencies` for more details.
237
-
238
-
#### Instance from same resources want different deletion order
239
-
240
-
The current proposal is to have namespace deletion order per resource. It is possible that the instances
241
-
with same resources want different namespace deletion order.
191
+
When a lack of progress detected(maybe caused by the dependency cycle described above), it could hang the deletion process
192
+
same as the current behavior.
242
193
243
-
The existing mechanism is to use random order while deletion, the proposal does not make things worse.
244
-
We could possibly introduce a way to let individual instance be able to specify the deletion order later if certain requests is commonly asked.
194
+
Mitigation: Delete the blocking finalizer to proceed.
245
195
246
196
## Design Details
247
197
248
198
### DeletionOrderPriority Mechanism
249
199
250
-
For the namespace deletion process, we would like to have the resources associated with this namespace be deleted in order of:
251
-
- Workload controllers
252
-
- Workload
253
-
- Default
254
-
- Policies
200
+
For the namespace deletion process, we would like to have the resources associated with this namespace be deleted as following:
255
201
256
-
Each resource type will be assigned a `NamespaceDeletionOrder` value(<TOBESOLVED - @cici37>int value or string value). To define the DeletionOrder, the options are:
202
+
- Delete all pods in the namespace (in an undefined order).
203
+
- Wait for all the pods to be stopped or deleted.
204
+
- Delete all the other resources in the namespace (in an undefined order).
257
205
258
-
1. Add a field into APIResource (not mutable) so it could be observed easily(pkg/registry):
In this case, the `NamespaceDeletionOrder` will be associated with the resources naturally and the resource which does not have an opinion will default to a value.
268
-
269
-
- Pros:
270
-
- Having it configurable
271
-
- Consistent between CRD and native type when add support into CRD
272
-
- Cons:
273
-
- Hard to oversee the overall prioritization across resources
274
-
275
-
2. Maintain a hard-coded map of DeletionOrder per resources for the resources which have a deletion order preference. Any resources which have no preference opinion would be in the Default category.
276
-
```
277
-
var resourceDeletionOrderPriorities = map[ResourceType]DeletionOrderPriority{
- Have one single place to manage the DeletionOrderPriority across all resources group
289
-
- Easy for future maintenance
290
-
- Cons:
291
-
- Not configurable
292
-
- Not encapsulated (behavior of a type/group defined far away from that type’s “main” code)
293
-
- Only apply to native types, need separate support for CRD
294
-
295
-
296
-
297
-
During namespace deletion, the namespace collector will traverse resources in the namespace and delete them in ascending priority order based on the `NamespaceDeletionOrder` defined.
206
+
The above order will be strict enforced as long as the feature gate is turned on.
298
207
299
208
### Handling Cyclic Dependencies
300
209
301
210
Cyclic dependencies can occur if resources within the namespace have finalizers set which conflicts with the DeletionOrderPriority.
302
211
For example, consider the following scenario:
303
212
304
-
- Resource A has a finalizer that depends on the deletion of Resource B.
305
-
306
-
- Resource A is in the earlier DeletionOrderPriority than Resource B.
307
-
308
-
In this case, the finalizers set would conflict with the `NamespaceDeletionOrder` set for resources and could lead to cyclic dependencies and cause namespace deletion process hanging.
309
-
310
-
To address this, the system will:
311
-
312
-
- Attempt to honor the `NamespaceDeletionOrder` for resource deletion.
313
-
314
-
- Monitor the deletion process for each `NamespaceDeletionOrder`. If the process hangs beyond a predefined timeout (e.g., 5 minutes),
315
-
it will detect the stall and trigger the deletion attempt for the next `NamespaceDeletionOrder` group.
316
-
317
-
- After moving on to the next NamespaceDeletionOrder group, the system will attempt to delete all resources under this group. At this stage, deletion is considered successful only when all resources from the current and previous groups have been fully removed.
318
-
319
-
- If the deletion of all resources from previous groups is not completed within the timeout period, the system will proceed to the next NamespaceDeletionOrder group, deleting those resources while waiting for any remaining resources from previous groups to be cleaned up.
320
-
321
-
- After looping through all NamespaceDeletionOrder groups, if there is still process blocking resources from being deleted, the system will behave same as the current mechanism.
322
-
323
-
By introducing a controlled timeout mechanism, the system ensures that cyclic dependencies do not block namespace deletion indefinitely while still striving for an ordered deletion whenever possible.
324
-
213
+
- Pod A has a finalizer that depends on the deletion of Resource B.
325
214
326
-
### Configure DeletionOrderPriority For CRD
215
+
- Pod A suppose to be deleted before Resource B.
327
216
328
-
It could be a phrase 2 feature depending on how the deletion order specified.
217
+
In this case, the finalizers set would conflict with the NamespaceDeletionOrder and could lead to cyclic dependencies and cause namespace deletion process hanging.
329
218
330
-
We would like to have `NamespaceDeletionOrder` configurable CRDs as well.
219
+
To mitigate the issue, user would have to manually resolve the dependency lock by either remove the finalizer or force delete the blocking resources which would be the same as current mechanism.
0 commit comments