Skip to content

Commit bbdb8c5

Browse files
authored
Merge pull request #30597 from mattcary/kep-1847
KEP 1847 StatefulSet autodelete documentation
2 parents b1a7356 + 40e06a6 commit bbdb8c5

File tree

1 file changed

+78
-0
lines changed

1 file changed

+78
-0
lines changed

content/en/docs/concepts/workloads/controllers/statefulset.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,84 @@ already attempted to run with the bad configuration.
301301
StatefulSet will then begin to recreate the Pods using the reverted template.
302302

303303

304+
## PersistentVolumeClaim retention
305+
306+
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
307+
308+
The optional `.spec.persistentVolumeClaimRetentionPolicy` field controls if
309+
and how PVCs are deleted during the lifecycle of a StatefulSet. You must enable the
310+
`StatefulSetAutoDeletePVC` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
311+
to use this field. Once enabled, there are two policies you can configure for each
312+
StatefulSet:
313+
314+
`whenDeleted`
315+
: configures the volume retention behavior that applies when the StatefulSet is deleted
316+
317+
`whenScaled`
318+
: configures the volume retention behavior that applies when the replica count of
319+
the StatefulSet is reduced; for example, when scaling down the set.
320+
321+
For each policy that you can configure, you can set the value to either `Delete` or `Retain`.
322+
323+
`Delete`
324+
: The PVCs created from the StatefulSet `volumeClaimTemplate` are deleted for each Pod
325+
affected by the policy. With the `whenDeleted` policy all PVCs from the
326+
`volumeClaimTemplate` are deleted after their Pods have been deleted. With the
327+
`whenScaled` policy, only PVCs corresponding to Pod replicas being scaled down are
328+
deleted, after their Pods have been deleted.
329+
330+
`Retain` (default)
331+
: PVCs from the `volumeClaimTemplate` are not affected when their Pod is
332+
deleted. This is the behavior before this new feature.
333+
334+
Bear in mind that these policies **only** apply when Pods are being removed due to the
335+
StatefulSet being deleted or scaled down. For example, if a Pod associated with a StatefulSet
336+
fails due to node failure, and the control plane creates a replacement Pod, the StatefulSet
337+
retains the existing PVC. The existing volume is unaffected, and the cluster will attach it to
338+
the node where the new Pod is about to launch.
339+
340+
The default for policies is `Retain`, matching the StatefulSet behavior before this new feature.
341+
342+
Here is an example policy.
343+
344+
```yaml
345+
apiVersion: apps/v1
346+
kind: StatefulSet
347+
...
348+
spec:
349+
persistentVolumeClaimRetentionPolicy:
350+
whenDeleted: Retain
351+
whenScaled: Delete
352+
...
353+
```
354+
355+
The StatefulSet {{<glossary_tooltip text="controller" term_id="controller">}} adds [owner
356+
references](/docs/concepts/overview/working-with-objects/owners-dependents/#owner-references-in-object-specifications)
357+
to its PVCs, which are then deleted by the {{<glossary_tooltip text="garbage collector"
358+
term_id="garbage-collection">}} after the Pod is terminated. This enables the Pod to
359+
cleanly unmount all volumes before the PVCs are deleted (and before the backing PV and
360+
volume are deleted, depending on the retain policy). When you set the `whenDeleted`
361+
policy to `Delete`, an owner reference to the StatefulSet instance is placed on all PVCs
362+
associated with that StatefulSet.
363+
364+
The `whenScaled` policy must delete PVCs only when a Pod is scaled down, and not when a
365+
Pod is deleted for another reason. When reconciling, the StatefulSet controller compares
366+
its desired replica count to the actual Pods present on the cluster. Any StatefulSet Pod
367+
whose id greater than the replica count is condemned and marked for deletion. If the
368+
`whenScaled` policy is `Delete`, the condemned Pods are first set as owners to the
369+
associated StatefulSet template PVCs, before the Pod is deleted. This causes the PVCs
370+
to be garbage collected after only the condemned Pods have terminated.
371+
372+
This means that if the controller crashes and restarts, no Pod will be deleted before its
373+
owner reference has been updated appropriate to the policy. If a condemned Pod is
374+
force-deleted while the controller is down, the owner reference may or may not have been
375+
set up, depending on when the controller crashed. It may take several reconcile loops to
376+
update the owner references, so some condemned Pods may have set up owner references and
377+
other may not. For this reason we recommend waiting for the controller to come back up,
378+
which will verify owner references before terminating Pods. If that is not possible, the
379+
operator should verify the owner references on PVCs to ensure the expected objects are
380+
deleted when Pods are force-deleted.
381+
304382
## {{% heading "whatsnext" %}}
305383

306384
* Learn about [Pods](/docs/concepts/workloads/pods).

0 commit comments

Comments
 (0)