|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: 'Kubernetes 1.27: StatefulSet PVC Auto-Deletion (beta)' |
| 4 | +date: 2023-05-04 |
| 5 | +slug: kubernetes-1-27-statefulset-pvc-auto-deletion-beta |
| 6 | +--- |
| 7 | + |
| 8 | +**Author:** Matthew Cary (Google) |
| 9 | + |
| 10 | +Kubernetes v1.27 graduated to beta a new policy mechanism for |
| 11 | +[`StatefulSets`](/docs/concepts/workloads/controllers/statefulset/) that controls the lifetime of |
| 12 | +their [`PersistentVolumeClaims`](/docs/concepts/storage/persistent-volumes/) (PVCs). The new PVC |
| 13 | +retention policy lets users specify if the PVCs generated from the `StatefulSet` spec template should |
| 14 | +be automatically deleted or retrained when the `StatefulSet` is deleted or replicas in the `StatefulSet` |
| 15 | +are scaled down. |
| 16 | + |
| 17 | +## What problem does this solve? |
| 18 | + |
| 19 | +A `StatefulSet` spec can include `Pod` and PVC templates. When a replica is first created, the |
| 20 | +Kubernetes control plane creates a PVC for that replica if one does not already exist. The behavior |
| 21 | +before the PVC retention policy was that the control plane never cleaned up the PVCs created for |
| 22 | +`StatefulSets` - this was left up to the cluster administrator, or to some add-on automation that |
| 23 | +you’d have to find, check suitability, and deploy. The common pattern for managing PVCs, either |
| 24 | +manually or through tools such as Helm, is that the PVCs are tracked by the tool that manages them, |
| 25 | +with explicit lifecycle. Workflows that use `StatefulSets` must determine on their own what PVCs are |
| 26 | +created by a `StatefulSet` and what their lifecycle should be. |
| 27 | + |
| 28 | +Before this new feature, when a StatefulSet-managed replica disappears, either because the |
| 29 | +`StatefulSet` is reducing its replica count, or because its `StatefulSet` is deleted, the PVC and its |
| 30 | +backing volume remains and must be manually deleted. While this behavior is appropriate when the |
| 31 | +data is critical, in many cases the persistent data in these PVCs is either temporary, or can be |
| 32 | +reconstructed from another source. In those cases, PVCs and their backing volumes remaining after |
| 33 | +their `StatefulSet` or replicas have been deleted are not necessary, incur cost, and require manual |
| 34 | +cleanup. |
| 35 | + |
| 36 | +## The new `StatefulSet` PVC retention policy |
| 37 | + |
| 38 | +The new `StatefulSet` PVC retention policy is used to control if and when PVCs created from a |
| 39 | +`StatefulSet`’s `volumeClaimTemplate` are deleted. There are two contexts when this may occur. |
| 40 | + |
| 41 | +The first context is when the `StatefulSet` resource is deleted (which implies that all replicas are |
| 42 | +also deleted). This is controlled by the `whenDeleted` policy. The second context, controlled by |
| 43 | +`whenScaled` is when the `StatefulSet` is scaled down, which removes some but not all of the replicas |
| 44 | +in a `StatefulSet`. In both cases the policy can either be `Retain`, where the corresponding PVCs are |
| 45 | +not touched, or `Delete`, which means that PVCs are deleted. The deletion is done with a normal |
| 46 | +[object deletion](/docs/concepts/architecture/garbage-collection/), so that, for example, all |
| 47 | +retention policies for the underlying PV are respected. |
| 48 | + |
| 49 | +This policy forms a matrix with four cases. I’ll walk through and give an example for each one. |
| 50 | + |
| 51 | + * **`whenDeleted` and `whenScaled` are both `Retain`.** |
| 52 | + |
| 53 | + This matches the existing behavior for `StatefulSets`, where no PVCs are deleted. This is also |
| 54 | + the default retention policy. It’s appropriate to use when data on `StatefulSet` volumes may be |
| 55 | + irreplaceable and should only be deleted manually. |
| 56 | + |
| 57 | + * **`whenDeleted` is `Delete` and `whenScaled` is `Retain`.** |
| 58 | + |
| 59 | + In this case, PVCs are deleted only when the entire `StatefulSet` is deleted. If the |
| 60 | + `StatefulSet` is scaled down, PVCs are not touched, meaning they are available to be reattached |
| 61 | + if a scale-up occurs with any data from the previous replica. This might be used for a temporary |
| 62 | + `StatefulSet`, such as in a CI instance or ETL pipeline, where the data on the `StatefulSet` is |
| 63 | + needed only during the lifetime of the `StatefulSet` lifetime, but while the task is running the |
| 64 | + data is not easily reconstructible. Any retained state is needed for any replicas that scale |
| 65 | + down and then up. |
| 66 | + |
| 67 | + * **`whenDeleted` and `whenScaled` are both `Delete`.** |
| 68 | + |
| 69 | + PVCs are deleted immediately when their replica is no longer needed. Note this does not include |
| 70 | + when a `Pod` is deleted and a new version rescheduled, for example when a node is drained and |
| 71 | + `Pods` need to migrate elsewhere. The PVC is deleted only when the replica is no longer needed |
| 72 | + as signified by a scale-down or `StatefulSet` deletion. This use case is for when data does not |
| 73 | + need to live beyond the life of its replica. Perhaps the data is easily reconstructable and the |
| 74 | + cost savings of deleting unused PVCs is more important than quick scale-up, or perhaps that when |
| 75 | + a new replica is created, any data from a previous replica is not usable and must be |
| 76 | + reconstructed anyway. |
| 77 | + |
| 78 | + * **`whenDeleted` is `Retain` and `whenScaled` is `Delete`.** |
| 79 | + |
| 80 | + This is similar to the previous case, when there is little benefit to keeping PVCs for fast |
| 81 | + reuse during scale-up. An example of a situation where you might use this is an Elasticsearch |
| 82 | + cluster. Typically you would scale that workload up and down to match demand, whilst ensuring a |
| 83 | + minimum number of replicas (for example: 3). When scaling down, data is migrated away from |
| 84 | + removed replicas and there is no benefit to retaining those PVCs. However, it can be useful to |
| 85 | + bring the entire Elasticsearch cluster down temporarily for maintenance. If you need to take the |
| 86 | + Elasticsearch system offline, you can do this by temporarily deleting the `StatefulSet`, and |
| 87 | + then bringing the Elasticsearch cluster back by recreating the `StatefulSet`. The PVCs holding |
| 88 | + the Elasticsearch data will still exist and the new replicas will automatically use them. |
| 89 | + |
| 90 | +Visit the |
| 91 | +[documentation](/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-policies) to |
| 92 | +see all the details. |
| 93 | + |
| 94 | +## What’s next? |
| 95 | + |
| 96 | +Try it out! The `StatefulSetAutoDeletePVC` feature gate is beta and enabled by default on |
| 97 | +cluster running Kubernetes 1.27. Create a `StatefulSet` using the new policy, test it out and tell |
| 98 | +us what you think! |
| 99 | + |
| 100 | +I'm very curious to see if this owner reference mechanism works well in practice. For example, I |
| 101 | +realized there is no mechanism in Kubernetes for knowing who set a reference, so it’s possible that |
| 102 | +the `StatefulSet` controller may fight with custom controllers that set their own |
| 103 | +references. Fortunately, maintaining the existing retention behavior does not involve any new owner |
| 104 | +references, so default behavior will be compatible. |
| 105 | + |
| 106 | +Please tag any issues you report with the label `sig/apps` and assign them to Matthew Cary |
| 107 | +([@mattcary](https://github.com/mattcary) at GitHub). |
| 108 | + |
| 109 | +Enjoy! |
0 commit comments