Skip to content

Commit 9726891

Browse files
committed
Address comments
1 parent 88eff69 commit 9726891

File tree

1 file changed

+35
-146
lines changed
  • keps/sig-api-machinery/5080-ordered-namespace-deletion

1 file changed

+35
-146
lines changed

keps/sig-api-machinery/5080-ordered-namespace-deletion/README.md

Lines changed: 35 additions & 146 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,14 @@
1010
- [User Stories (Optional)](#user-stories-optional)
1111
- [Story 1 - Pod VS NetworkPolicy](#story-1---pod-vs-networkpolicy)
1212
- [Story 2 - having finalizer conflicts with deletion order](#story-2---having-finalizer-conflicts-with-deletion-order)
13+
- [Story 3 - having policy set up with parameter resources](#story-3---having-policy-set-up-with-parameter-resources)
1314
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
1415
- [Having ownerReference conflicts with deletion order](#having-ownerreference-conflicts-with-deletion-order)
1516
- [Risks and Mitigations](#risks-and-mitigations)
1617
- [Dependency cycle](#dependency-cycle)
17-
- [Instance from same resources want different deletion order](#instance-from-same-resources-want-different-deletion-order)
1818
- [Design Details](#design-details)
1919
- [DeletionOrderPriority Mechanism](#deletionorderpriority-mechanism)
2020
- [Handling Cyclic Dependencies](#handling-cyclic-dependencies)
21-
- [Configure DeletionOrderPriority For CRD](#configure-deletionorderpriority-for-crd)
2221
- [Test Plan](#test-plan)
2322
- [Prerequisite testing updates](#prerequisite-testing-updates)
2423
- [Unit tests](#unit-tests)
@@ -72,13 +71,12 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
7271

7372
## Summary
7473

75-
This kep introduces an ordered deletion process in the Kubernetes namespace deletion
76-
to ensure predictable and secure deletion of resources within a namespace.
74+
This kep introduces an opinionated deletion process in the Kubernetes namespace deletion
75+
to ensure secure deletion of resources within a namespace.
7776
The current deletion process is semi-random, which may lead to security gaps or
7877
unintended behavior, such as Pods persisting after the deletion of their associated NetworkPolicies.
79-
By implementing a prioritized deletion mechanism, resources will be deleted in a
80-
predefined order that respects logical and security dependencies.
81-
For example, Pods will be deleted before NetworkPolicies, ensuring that no Pod is left unprotected during the cleanup process.
78+
By implementing an opinionated deletion mechanism, the Pods will be deleted before other resources with
79+
respects logical and security dependencies.
8280
This design enhances the security and reliability of Kubernetes by mitigating risks arising from the non-deterministic deletion order.
8381

8482

@@ -94,11 +92,10 @@ Additionally, the lack of a defined deletion order can lead to operational incon
9492
where any sort of safety guard resources (not just NetworkPolicy) are deleted before their guarded resources (e.g., Pods),
9593
resulting in unnecessary disruptions or errors.
9694

97-
By introducing a prioritized deletion process, this proposal aims to:
95+
By introducing an opinionated deletion process, this proposal aims to:
9896

9997
- Enhance Security: Ensure resources like NetworkPolicies remain in effect until all dependent resources have been safely terminated.
10098
- Increase Predictability: Provide a consistent and logical cleanup process for namespace deletion, reducing unintended side effects.
101-
- Improve User Experience: Allow cluster administrators and developers to rely on a robust deletion mechanism without requiring manual intervention or custom scripts.
10299

103100
This opinionated deletion approach aligns with Kubernetes' principles of reliability, security, and extensibility,
104101
providing a solid foundation for managing resource cleanup in complex environments.
@@ -136,49 +133,12 @@ providing a solid foundation for managing resource cleanup in complex environmen
136133

137134
## Proposal
138135

139-
Introduce a way to specify priority based on resource type while deleting namespace. When the feature gate `OrderedNamespaceDeletion` is enabled,
140-
the resources associated with this namespace should be deleted in order.
141-
142-
To specify the deletion order, the options would be:
143-
144-
Option 1: have int value assigned for resources to indicate the deletion priority such like:
145-
146-
```
147-
{Resource: "/pods", DeletionPriority: "5"}
148-
{Resource: "networking.k8s.io/networkpolicies", DeletionPriority: "-999"}
149-
{Resource: "apps.k8s.io/deployments", DeletionPriority: "10"}
150-
...
151-
```
152-
0 would be the default DeletionPriority if not specified. And the resources which have deletion ordering concern would make sure to set DeletionPriority ahead of the resource it guards.
153-
154-
Option 2: have the deletion priority bands defined instead of using numbers.
155-
156-
To begin with, the deletion order bands would be introduced:
157-
- Workload Controllers
158-
- Workloads
159-
- Default
160-
- Policies
161-
162-
Those deletion order bands will be deleted in sequence. E.g.
163-
164-
```
165-
{Resource: "/pods", DeletionPriority: "workload"}
166-
{Resource: "networking.k8s.io/networkpolicies", DeletionPriority: "policies"}
167-
{Resource: "apps.k8s.io/dloyments", DeletionPriority: "workload controllers"}
168-
...
169-
```
170-
171-
After the deletion priority set, the namespace deletion process will honor the priority and delete the resources in sequence.
172-
The resources with higher priority should always be deleted before the resources with lower priority.
173-
174-
<References> Looks like we have examples for using int value for priority configurations existing in Kubernetes already.
175-
176-
1.[Aggregator API priority] https://github.com/kubernetes/kubernetes/blob/659c437b267c4535d2855beee8abe5c121d58569/cmd/kube-apiserver/app/aggregator.go#L27-L59
177-
178-
2. [API priority and fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/#flowschema)
179-
180-
3. [Priority clasee](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass)
136+
When the feature gate `OrderedNamespaceDeletion` is enabled,
137+
the resources associated with this namespace should be deleted in order:
181138

139+
- Delete all pods in the namespace (in an undefined order).
140+
- Wait for all the pods to be stopped or deleted.
141+
- Delete all the other resources in the namespace (in an undefined order).
182142

183143
### User Stories (Optional)
184144

@@ -188,17 +148,25 @@ A user has pods which listen on the network and network policies which help prot
188148
While namespace deletion, there could be cases that NetworkPolicy has deleted while the pods are running
189149
which cause the security concern of having Pods running unprotected.
190150

191-
After this feature was introduced, we would have NetworkPolicy in lower DeletionOrderPriority than the Pods to
151+
After this feature was introduced, we would have NetworkPolicy always deleted after the Pods to
192152
avoid the above security concern.
193153

194154
#### Story 2 - having finalizer conflicts with deletion order
195155

196-
E.g. the NetworkPolicy deletion order would be set lower than the pod and we could expect the Pod be deleted
197-
before NetworkPolicy. However, if even one pod has a finalizer which is waiting for network policies (which is opaque to Kubernetes),
156+
E.g. if the pod has a finalizer which is waiting for network policies (which is opaque to Kubernetes),
198157
it will cause dependency loops and block the deletion process.
199158

200159
Refer to the section `Handling Cyclic Dependencies`.
201160

161+
#### Story 3 - having policy set up with parameter resources
162+
163+
When ValidatingAdmissionPolicy is used in the cluster with parameterization, it is possible to use pod as the parameter resources. In this case, the parameter resources will be deleted before VAP and
164+
lead the VAP not in use. To make it even worse, if the ValidatingAdmissionPolicyBinding is configured with `.spec.paramRef.parameterNotFoundAction: Deny`,
165+
it could block certain resources operations and also hang the termination process. Similar concern applies to Webhooks with parameter resources.
166+
167+
It is an existing issue with current namespace deletion as well. As long as we don't plan to have a dependency graph built, it will rely more on
168+
best practices and user's configuration.
169+
202170
### Notes/Constraints/Caveats (Optional)
203171

204172
#### Having ownerReference conflicts with deletion order
@@ -209,8 +177,7 @@ Namespace deletion specifically uses `metav1.DeletePropagationBackground` and al
209177
dependencies would be handled by the garbage collection.
210178

211179
In Kubernetes, `ownerReferences` define a parent-child relationship where child resources are automatically deleted when the parent is removed.
212-
This is mostly handled by garbage collection. While namespace deletion, the `ownerReferences` is not part of the consideration and
213-
`NamespaceDeletionOrder` group will be honored while deleting resources as what it is currently. The garbage collector controller will make sure
180+
This is mostly handled by garbage collection. While namespace deletion, the `ownerReferences` is not part of the consideration and the garbage collector controller will make sure
214181
no child resources still existing after the parent resource deleted.
215182

216183

@@ -221,113 +188,35 @@ no child resources still existing after the parent resource deleted.
221188
The introduction of deletion order could potentially cause dependency loops especially when finalizers are
222189
specified against deletion priority.
223190

224-
When a lack of progress detected(maybe caused by the dependency cycle described above), the options would be:
225-
- Fall back to the previous behavior.
226-
- Pros: The deletion would not be blocked; no breaking changes;
227-
- Cons: Security concern remains unaddressed or could be bypassed
228-
229-
- Return error after retry.
230-
- Pros: Make sure the security concern being addressed by always honor the deletion order
231-
- Cons: Block namespace deletion if dependency cycle exists.
232-
233-
Mitigation: A proper fallback mechanism would be introduced to make sure the namespace deletion process would not be
234-
hanging forever because of potential dependency cycle.
235-
236-
Refer to the section `Handling Cyclic Dependencies` for more details.
237-
238-
#### Instance from same resources want different deletion order
239-
240-
The current proposal is to have namespace deletion order per resource. It is possible that the instances
241-
with same resources want different namespace deletion order.
191+
When a lack of progress detected(maybe caused by the dependency cycle described above), it could hang the deletion process
192+
same as the current behavior.
242193

243-
The existing mechanism is to use random order while deletion, the proposal does not make things worse.
244-
We could possibly introduce a way to let individual instance be able to specify the deletion order later if certain requests is commonly asked.
194+
Mitigation: Delete the blocking finalizer to proceed.
245195

246196
## Design Details
247197

248198
### DeletionOrderPriority Mechanism
249199

250-
For the namespace deletion process, we would like to have the resources associated with this namespace be deleted in order of:
251-
- Workload controllers
252-
- Workload
253-
- Default
254-
- Policies
200+
For the namespace deletion process, we would like to have the resources associated with this namespace be deleted as following:
255201

256-
Each resource type will be assigned a `NamespaceDeletionOrder` value(<TOBESOLVED - @cici37>int value or string value). To define the DeletionOrder, the options are:
202+
- Delete all pods in the namespace (in an undefined order).
203+
- Wait for all the pods to be stopped or deleted.
204+
- Delete all the other resources in the namespace (in an undefined order).
257205

258-
1. Add a field into APIResource (not mutable) so it could be observed easily(pkg/registry):
259-
260-
```
261-
type APIResource struct {
262-
……
263-
`NamespaceDeletionOrder int64 `json:"namespaceDeletionOrder" protobuf:"varint,11,opt,name=namespaceDeletionOrder"``
264-
}
265-
```
266-
267-
In this case, the `NamespaceDeletionOrder` will be associated with the resources naturally and the resource which does not have an opinion will default to a value.
268-
269-
- Pros:
270-
- Having it configurable
271-
- Consistent between CRD and native type when add support into CRD
272-
- Cons:
273-
- Hard to oversee the overall prioritization across resources
274-
275-
2. Maintain a hard-coded map of DeletionOrder per resources for the resources which have a deletion order preference. Any resources which have no preference opinion would be in the Default category.
276-
```
277-
var resourceDeletionOrderPriorities = map[ResourceType]DeletionOrderPriority{
278-
"/pods": DeletionOrdeWorkload,
279-
"apps.k8s.io/deployments": DeletionOrdeWorkloadController.
280-
"networking.k8s.io/networkpolicies”: DeletionOrdePolicy,
281-
……
282-
})
283-
284-
```
285-
286-
- Pros:
287-
- The feature would be fully under control
288-
- Have one single place to manage the DeletionOrderPriority across all resources group
289-
- Easy for future maintenance
290-
- Cons:
291-
- Not configurable
292-
- Not encapsulated (behavior of a type/group defined far away from that type’s “main” code)
293-
- Only apply to native types, need separate support for CRD
294-
295-
296-
297-
During namespace deletion, the namespace collector will traverse resources in the namespace and delete them in ascending priority order based on the `NamespaceDeletionOrder` defined.
206+
The above order will be strict enforced as long as the feature gate is turned on.
298207

299208
### Handling Cyclic Dependencies
300209

301210
Cyclic dependencies can occur if resources within the namespace have finalizers set which conflicts with the DeletionOrderPriority.
302211
For example, consider the following scenario:
303212

304-
- Resource A has a finalizer that depends on the deletion of Resource B.
305-
306-
- Resource A is in the earlier DeletionOrderPriority than Resource B.
307-
308-
In this case, the finalizers set would conflict with the `NamespaceDeletionOrder` set for resources and could lead to cyclic dependencies and cause namespace deletion process hanging.
309-
310-
To address this, the system will:
311-
312-
- Attempt to honor the `NamespaceDeletionOrder` for resource deletion.
313-
314-
- Monitor the deletion process for each `NamespaceDeletionOrder`. If the process hangs beyond a predefined timeout (e.g., 5 minutes),
315-
it will detect the stall and trigger the deletion attempt for the next `NamespaceDeletionOrder` group.
316-
317-
- After moving on to the next NamespaceDeletionOrder group, the system will attempt to delete all resources under this group. At this stage, deletion is considered successful only when all resources from the current and previous groups have been fully removed.
318-
319-
- If the deletion of all resources from previous groups is not completed within the timeout period, the system will proceed to the next NamespaceDeletionOrder group, deleting those resources while waiting for any remaining resources from previous groups to be cleaned up.
320-
321-
- After looping through all NamespaceDeletionOrder groups, if there is still process blocking resources from being deleted, the system will behave same as the current mechanism.
322-
323-
By introducing a controlled timeout mechanism, the system ensures that cyclic dependencies do not block namespace deletion indefinitely while still striving for an ordered deletion whenever possible.
324-
213+
- Pod A has a finalizer that depends on the deletion of Resource B.
325214

326-
### Configure DeletionOrderPriority For CRD
215+
- Pod A suppose to be deleted before Resource B.
327216

328-
It could be a phrase 2 feature depending on how the deletion order specified.
217+
In this case, the finalizers set would conflict with the NamespaceDeletionOrder and could lead to cyclic dependencies and cause namespace deletion process hanging.
329218

330-
We would like to have `NamespaceDeletionOrder` configurable CRDs as well.
219+
To mitigate the issue, user would have to manually resolve the dependency lock by either remove the finalizer or force delete the blocking resources which would be the same as current mechanism.
331220

332221

333222
### Test Plan

0 commit comments

Comments
 (0)