Skip to content

Commit b55e81a

Browse files
authored
docs: flesh out how to handle resources on deletion. (#5069)
This change aims to enhance the documentation by providing more detailed instructions under advanced topics on how to handle resource deletion, going through multiple scenarios. It also add a couple of lines to the best practices section to refer to it. Signed-off-by: Frederic Giloux <[email protected]>
1 parent d3e215d commit b55e81a

File tree

2 files changed

+30
-7
lines changed

2 files changed

+30
-7
lines changed

website/content/en/docs/best-practices/best-practices.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ Considerations for Operator developers:
3838

3939
- Operators are instrumented to provide useful, actionable metrics to external systems (e.g. monitoring/alerting platforms). Minimally, metrics should represent the software's health and key performance indicators, as well as support the creation of [service levels indicators](https://en.wikipedia.org/wiki/Service_level_indicator) such as throughput, latency, availability, errors, capacity, etc.
4040

41+
- Operators may create objects as part of their operational duty. Object accumulation can consume unnecessary resources, slow down the API and clutter the user interface. As such it is important for operators to keep good hygiene and to clean up resources when they are not needed. Here are instructions on [how to handle cleanup on deletion][advanced-topics].
42+
4143
### Summary
4244

4345
- One Operator per managed application
@@ -50,6 +52,7 @@ Considerations for Operator developers:
5052
- Use semver / observe Kubernetes guidelines on versioning APIs
5153
- Use OpenAPI spec with structural schema on CRDs
5254
- Operators expose metrics to external systems
55+
- Operators cleanup resources on deletion
5356

5457
## Running On-Cluster
5558

@@ -97,4 +100,5 @@ On the cluster, an Operator...
97100
- Should always be able to deploy and come up without user input
98101
- Offers (pre)configuration via a `“Configuration CR”` instantiated by InitContainers
99102

100-
[Dependency Resolution]: https://olm.operatorframework.io/docs/concepts/olm-architecture/dependency-resolution/
103+
[Dependency Resolution]: https://olm.operatorframework.io/docs/concepts/olm-architecture/dependency-resolution/
104+
[advanced-topics]:/docs/building-operators/golang/advanced-topics#handle-cleanup-on-deletion

website/content/en/docs/building-operators/golang/advanced-topics.md

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -120,14 +120,24 @@ To learn about how metrics work in the Operator SDK read the [metrics section][m
120120
121121
### Handle Cleanup on Deletion
122122
123-
To implement complex deletion logic, you can add a finalizer to your Custom Resource. This will prevent your Custom Resource from being
124-
deleted until you remove the finalizer (ie, after your cleanup logic has successfully run). For more information, see the
125-
[official Kubernetes documentation on finalizers](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#finalizers).
123+
Operators may create objects as part of their operational duty. Object accumulation can consume unnecessary resources, slow down the API and clutter the user interface. As such it is important for operators to keep good hygiene and to clean up resources when they are not needed. Here are a few common scenarios.
126124
127-
**Example:**
125+
#### Internal Resources
126+
127+
A typical example of correct resource cleanup is the [Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/) implementation. When a Job is created, one or multiple Pods are created as child resources. When a Job is deleted, the associated Pods are deleted as well. This is a very common pattern easily achieved by setting an owner reference from the parent (Job) to the child (Pod) object. Here is a code snippet for doing so, where "r" is the reconcilier and "ctrl" the controller-runtime library:
128+
129+
```go
130+
ctrl.SetControllerReference(job, pod, r.Scheme)
131+
```
132+
133+
Note that the default behavior for cascading deletion is background propagation, meaning deletion requests for child objects occur after the request to delete the parent object. [This Kubernetes doc](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/) provides alternative deletion types.
134+
135+
#### External Resources
136+
137+
Sometimes external resources or resources that are not owned by a custom resource, those across namespaces for example, need to be cleaned up when the parent resource is deleted. In that case [Finalizers](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#finalizers) can be leveraged. A deletion request for an object with a finalizer becomes an update during which a deletion timestamp is set; the object is not deleted while the finalizer is present. The reconciliation loop of the custom resource's controller will then need to check whether a the deletion timestamp is set, perform the external cleanup operation(s), then remove the finalizer to allow garbage collection of the object. Multiple finalizers may be present on an object, each with a key that should indicate what external resources require deletion by the controller.
138+
139+
The following is a snippet from a theoretical controller file `controllers/memcached_controller.go` that implements a finalizer handler:
128140
129-
The following is a snippet from a theoretical controller file `controllers/memcached_controller.go`
130-
that implements a finalizer handler:
131141
132142
```go
133143
import (
@@ -206,6 +216,15 @@ func (r *MemcachedReconciler) finalizeMemcached(reqLogger logr.Logger, m *cachev
206216
}
207217
```
208218
219+
#### Complex cleanup logic
220+
221+
Similar to the previous scenario, finalizers can be used for implementing complex cleanup logic. Take [CronJobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/) as an example: the controller maintains limited-size lists of jobs that have been created by the CronJob controller to check for deletion. These list sizes are configured by the CronJob fields [`.spec.successfulJobsHistoryLimit` and `.spec.failedJobsHistoryLimit`](https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#jobs-history-limits), which specify how many completed and failed jobs should be kept. Check out the [Kubebuilder CronJob tutorial](https://book.kubebuilder.io/cronjob-tutorial/controller-implementation.html#3-clean-up-old-jobs-according-to-the-history-limit) for full implementation details.
222+
223+
#### Sensitive resources
224+
225+
Sensitive resources need to be protected against unintended deletion. An intuitive example of protecting resources is the [PersistentVolume (PV) / PersistentVolumeClaim (PVC)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) relationship. A PV is first created, after which users can request access to that PV's storage by creating a PVC, which gets bound to the PV. If a user tries to delete a PV currently bound by a PVC, the PV is not removed immediately. Instead, PV removal is postponed until the PV is not bound to any PVC. Finalizers again can be leveraged to achieve a similar behaviour for your own PV-like custom resources: by setting a finalizer on an object, your controller can make sure there are no remaining objects bound to it before removing the finalizer and deleting the object.
226+
Additionally, the user who created the PVC can specify what happens to the underlying storage allocated in a PV when the PVC is deleted through the [reclaim policy](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reclaiming). There are several options available, each of which defines a behavior that is achieved again through the use of finalizers. The key concept to take away is that your operator can give a user the power to decide how their resources are cleaned up via finalizers, which may be dangerous yet useful depending on your workloads.
227+
209228
### Leader election
210229
211230
During the lifecycle of an operator it's possible that there may be more than 1 instance running at any given time e.g when rolling out an upgrade for the operator.

0 commit comments

Comments
 (0)