|
25 | 25 | - [Mutating <code>PersistentVolumeClaimRetentionPolicy</code>](#mutating-)
|
26 | 26 | - [Cluster role change for statefulset controller](#cluster-role-change-for-statefulset-controller)
|
27 | 27 | - [Test Plan](#test-plan)
|
| 28 | + - [Unit tests](#unit-tests) |
| 29 | + - [Integration tests](#integration-tests) |
| 30 | + - [E2E tests](#e2e-tests) |
| 31 | + - [Upgrade/downgrade & feature enabled/disable tests](#upgradedowngrade--feature-enableddisable-tests) |
28 | 32 | - [Graduation Criteria](#graduation-criteria)
|
29 | 33 | - [Alpha release](#alpha-release)
|
30 | 34 | - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
|
|
46 | 50 | Items marked with (R) are required *prior to targeting to a milestone / release*.
|
47 | 51 |
|
48 | 52 | - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
|
49 |
| -- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
50 |
| -- [ ] (R) Design details are appropriately documented |
51 |
| -- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
| 53 | +- [X] (R) KEP approvers have approved the KEP status as `implementable` |
| 54 | +- [X] (R) Design details are appropriately documented |
| 55 | +- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
52 | 56 | - [ ] (R) Graduation criteria is in place
|
53 | 57 | - [ ] (R) Production readiness review completed
|
54 | 58 | - [ ] (R) Production readiness review approved
|
55 | 59 | - [ ] "Implementation History" section is up-to-date for milestone
|
56 |
| -- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 60 | +- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
57 | 61 | - [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
|
58 | 62 |
|
59 | 63 |
|
@@ -111,7 +115,7 @@ The following changes are required:
|
111 | 115 |
|
112 | 116 | These fields may be set to the following values.
|
113 | 117 | * `Retain` - the default policy, which is also used when no policy is
|
114 |
| - specified. This specifies the existing behaviour: when a StatefulSet is |
| 118 | + specified. This specifies the existing behavior: when a StatefulSet is |
115 | 119 | deleted or scaled down, no action is taken with respect to the PVCs
|
116 | 120 | created by the StatefulSet.
|
117 | 121 | * `Delete` - specifies that the appropriate PVCs as described above will be
|
@@ -177,7 +181,7 @@ VolumeClaimTemplate it will be deleted according to the deletion policy.
|
177 | 181 | Currently the PVCs created by StatefulSet are not deleted automatically. Using
|
178 | 182 | `whenScaled` or `whenDeleted` set to `Delete` would delete the PVCs
|
179 | 183 | automatically. Since this involves persistent data being deleted, users should
|
180 |
| -take appropriate care using this feature. Having the `Retain` behaviour as |
| 184 | +take appropriate care using this feature. Having the `Retain` behavior as |
181 | 185 | default will ensure that the PVCs remain intact by default and only a conscious
|
182 | 186 | choice made by user will involve any persistent data being deleted.
|
183 | 187 |
|
@@ -305,38 +309,43 @@ In order to update the PVC ownerReference, the `buildControllerRoles` will be up
|
305 | 309 |
|
306 | 310 | ### Test Plan
|
307 | 311 |
|
308 |
| -1. Unit tests |
309 |
| - |
310 |
| -1. e2e/integration tests |
311 |
| - - With `Delete` policies for `whenScaled` and `whenDeleted` |
312 |
| - 1. Create 2 pod statefulset, scale to 1 pod, confirm PVC deleted |
313 |
| - 1. Create 2 pod statefulset, add data to PVs, scale to 1 pod, scale back to 2, confirm PV empty. |
314 |
| - 1. Create 2 pod statefulset, delete stateful set, confirm PVCs deleted. |
315 |
| - 1. Create 2 pod statefulset, add data to PVs, manually delete one pod, confirm pod comes back and PV still has data (PVC not deleted). |
316 |
| - 1. As above, but manually delete all pods in stateful set. |
317 |
| - 1. Create 2 pod statefulset, add data to PVs, manually delete one pod, immediately scale down to one pod, confirm PVC is deleted. |
318 |
| - 1. Create 2 pod statefulset, add data to PVs, manually delete one pod, immediately scale down to one pod, scale back to two pods, confirm PV is empty. |
319 |
| - 1. Create 2 pod statefulset, add data to PVs, perform rolling confirm PVC don't get deleted and PV contents remain intact and useful in the updated pods. |
320 |
| - - With `Delete` policy for `whenDeleted` only |
321 |
| - 1. Create 2 pod statefulset, scale to 1 pod, confirm PVC still exists, |
322 |
| - 1. Create 2 pod statefulset, add data to PVs, scale to 1 pod, scale back to 2, confirm PV has data (PVC not deleted). |
323 |
| - 1. Create 2 pod statefulset, delete stateful set, confirm PVCs deleted |
324 |
| - 1. Create 2 pod statefulset, add data to PVs, manually delete one pod, confirm pod comes back and PV has data (PVC not deleted). |
325 |
| - 1. As above, but manually delete all pods in stateful set. |
326 |
| - 1. Create 2 pod statefulset, add data to PVs, manually delete one pod, immediately scale down to one pod, confirm PVC exists. |
327 |
| - 1. Create 2 pod statefulset, add data to PVs, manually delete one pod, immediately scale down to one pod, scale back to two pods, confirm PV has data. |
328 |
| - - Retain: |
329 |
| - 1. same tests as above, but PVCs not deleted in any case and confirm data intact on the PV. |
330 |
| - - Pod restart tests: |
331 |
| - 1. Create statefulset, perform rolling update, confirm PVC data still exists. |
332 |
| - - `--casecade=false` tests. |
333 |
| -1. Upgrade/Downgrade tests |
| 312 | +[X] I/we understand the owners of the involved components may require updates to |
| 313 | +existing tests to make this code solid enough prior to committing the changes necessary |
| 314 | +to implement this enhancement. |
| 315 | + |
| 316 | +#### Unit tests |
| 317 | + |
| 318 | +- `k8s.io/kubernetes/pkg/controller/statefulset`: `2022-06-15`: `85.5%` |
| 319 | +- `k8s.io/kubernetes/pkg/registry/apps/statefulset`: `2022-06-15`: `68.4%` |
| 320 | +- `k8s.io/kubernetes/pkg/registry/apps/statefulset/storage`: `2022-06-15`: `64%` |
| 321 | + |
| 322 | + |
| 323 | +##### Integration tests |
| 324 | + |
| 325 | +- `test/integration/statefulset`: `2022-06-15`: These do not appear to be |
| 326 | + running in a job visible to the triage dashboard, see for example a search |
| 327 | + for the previously existing [TestStatefulSetStatusWithPodFail](https://storage.googleapis.com/k8s-triage/index.html?test=TestStatefulSetStatusWithPodFail). |
| 328 | + |
| 329 | +Added `TestAutodeleteOwnerRefs` to `k8s.io/kubernetes/test/integration/statefulset`. |
| 330 | + |
| 331 | +##### E2E tests |
| 332 | + |
| 333 | +- `ci-kuberentes-e2e-gci-gce-statefulset`: `2022-06-15`: `3/646 Failures` |
| 334 | + - Note that as this is behind the `StatefulSetAutoDeletePVC` feature gate, |
| 335 | + tests for this KEP are not being run. |
| 336 | + |
| 337 | +Added `Feature:StatefulSetAutoDeletePVC` tests to `k8s.io/kubernetes/test/e2e/apps/`. |
| 338 | + |
| 339 | +##### Upgrade/downgrade & feature enabled/disable tests |
| 340 | + |
| 341 | +Should be added as an e2e tests, but we have not figured out if there is a |
| 342 | +mechanism to run upgrade/downgrade tests. |
| 343 | + |
334 | 344 | 1. Create statefulset in previous version and upgrade to the version
|
335 | 345 | supporting this feature. The PVCs should remain intact.
|
336 | 346 | 2. Downgrade to earlier version and check the PVCs with Retain
|
337 | 347 | remain intact and the others with set policies before upgrade
|
338 | 348 | gets deleted based on if the references were already set.
|
339 |
| -1. Feature disablement/enable test for alpha feature flag `statefulset-autodelete-pvcs`. |
340 | 349 |
|
341 | 350 |
|
342 | 351 | ### Graduation Criteria
|
@@ -373,10 +382,10 @@ are not involved so there is no version skew between nodes and the control plane
|
373 | 382 | resource (eg dropDisabledFields).
|
374 | 383 |
|
375 | 384 | * **Does enabling the feature change any default behavior?**
|
376 |
| - No. What happens during StatefulSet deletion differs from current behaviour |
| 385 | + No. What happens during StatefulSet deletion differs from current behavior |
377 | 386 | only when the user explicitly specifies the
|
378 | 387 | `PersistentVolumeClaimDeletePolicy`. Hence no change in any user visible
|
379 |
| - behaviour change by default. |
| 388 | + behavior change by default. |
380 | 389 |
|
381 | 390 | * **Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?**
|
382 | 391 | Yes. Disabling the feature gate will cause the new field to be ignored. If the feature
|
@@ -407,72 +416,71 @@ are not involved so there is no version skew between nodes and the control plane
|
407 | 416 |
|
408 | 417 | ### Rollout, Upgrade and Rollback Planning
|
409 | 418 |
|
410 |
| -_TBD upon graduation to beta._ |
411 |
| - |
412 | 419 | * **How can a rollout fail? Can it impact already running workloads?**
|
413 |
| - Try to be as paranoid as possible - e.g., what if some components will restart |
414 |
| - mid-rollout? |
| 420 | + If there is a control plane update which disables the feature while a stateful |
| 421 | + set is in the process of being deleted or scaled down, it is undefined which |
| 422 | + PVCs will be deleted. Before the update, PVCs will be marked for deletion; |
| 423 | + until the updated controller has a chance to reconcile some PVCs may be |
| 424 | + garbage collected before the controller has a chance to remove any owner |
| 425 | + references. We do not think this is a true failure, as it should be clear to |
| 426 | + an operator that there is an essential race condition when a cluster update |
| 427 | + happens during a stateful set scale down or delete. |
415 | 428 |
|
416 | 429 | * **What specific metrics should inform a rollback?**
|
| 430 | + The operator can monitor the `statefulset_pvcs_owned_by_*` metrics to see if |
| 431 | + there are possible pending deletions. If consistent behavior is required, the |
| 432 | + operator can wait for those metrics to stablize. For example, |
| 433 | + `statefulset_pvcs_owned_by_pod` going to zero indicates all scale down |
| 434 | + deletions are complete. |
417 | 435 |
|
418 | 436 | * **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
|
419 |
| - Describe manual testing that was done and the outcomes. |
420 |
| - Longer term, we may want to require automated upgrade/rollback tests, but we |
421 |
| - are missing a bunch of machinery and tooling and can't do that now. |
| 437 | + Yes. The race condition wasn't exposed, but we confirmed the PVCs were updated correctly. |
422 | 438 |
|
423 | 439 | * **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
|
424 | 440 | fields of API types, flags, etc.?**
|
425 |
| - Even if applying deprecation policies, they may still surprise some users. |
| 441 | + Enabling the feature also enables the `PersistentVolumeClaimRetentionPolicy` |
| 442 | + api field. |
426 | 443 |
|
427 | 444 | ### Monitoring Requirements
|
428 | 445 |
|
429 |
| -_TBD upon graduation to beta._ |
430 |
| - |
431 | 446 | * **How can an operator determine if the feature is in use by workloads?**
|
432 |
| - Ideally, this should be a metric. Operations against the Kubernetes API (e.g., |
433 |
| - checking if there are objects with field X set) may be a last resort. Avoid |
434 |
| - logs or events for this purpose. |
| 447 | + `statefulset_when_deleted_policy` or `statefulset_when_scaled_policy` will |
| 448 | + have nonzero counts for the `delete` policy fields. |
435 | 449 |
|
436 | 450 | * **What are the SLIs (Service Level Indicators) an operator can use to determine
|
437 | 451 | the health of the service?**
|
438 |
| - - [ ] Metrics |
439 |
| - - Metric name: |
440 |
| - - [Optional] Aggregation method: |
441 |
| - - Components exposing the metric: |
442 |
| - - [ ] Other (treat as last resort) |
443 |
| - - Details: |
| 452 | + - Metric name: `statefulset_reconcile_delay` |
| 453 | + - [Optional] Aggregation method: `quantile` |
| 454 | + - Components exposing the metric: `pke/controller/statefulset` |
| 455 | + - Metric name: `statefulset_unhealthy_pods` |
| 456 | + - [Optional] Aggregation method: `sum` |
| 457 | + - Components exposing the metric: `pke/controller/statefulset` |
444 | 458 |
|
445 | 459 | * **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
|
446 |
| - At a high level, this usually will be in the form of "high percentile of SLI |
447 |
| - per day <= X". It's impossible to provide comprehensive guidance, but at the very |
448 |
| - high level (needs more precise definitions) those may be things like: |
449 |
| - - per-day percentage of API calls finishing with 5XX errors <= 1% |
450 |
| - - 99% percentile over day of absolute value from (job creation time minus expected |
451 |
| - job creation time) for cron job <= 10% |
452 |
| - - 99,9% of /health requests per day finish with 200 code |
| 460 | + |
| 461 | + The reconcile delay (time between statefulset reconcilliation loops) should be |
| 462 | + low. For example, the 99%ile should be at most minutes. |
| 463 | + |
| 464 | + This can be combined with the unhealthy pod count, although as unhealthy pods |
| 465 | + are usually an application error rather than a problem with the stateful set |
| 466 | + controller, this will be more a decision for the operator to decide on a |
| 467 | + per-cluster basis. |
453 | 468 |
|
454 | 469 | * **Are there any missing metrics that would be useful to have to improve observability
|
455 | 470 | of this feature?**
|
456 |
| - Describe the metrics themselves and the reasons why they weren't added (e.g., cost, |
457 |
| - implementation difficulties, etc.). |
458 | 471 |
|
459 |
| -### Dependencies |
| 472 | + The stateful set controller has not had any metrics in the past despite it |
| 473 | + being a core Kubernetes feature for some time. Hence which metrics are useful |
| 474 | + in practice is an open question in spite of the stability of the feature. |
460 | 475 |
|
461 |
| -_TBD upon graduation to beta._ |
| 476 | +### Dependencies |
462 | 477 |
|
463 | 478 | * **Does this feature depend on any specific services running in the cluster?**
|
464 |
| - Think about both cluster-level services (e.g. metrics-server) as well |
465 |
| - as node-level agents (e.g. specific version of CRI). Focus on external or |
466 |
| - optional services that are needed. For example, if this feature depends on |
467 |
| - a cloud provider API, or upon an external software-defined storage or network |
468 |
| - control plane. |
469 | 479 |
|
470 |
| - For each of these, fill in the following—thinking about running existing user workloads |
471 |
| - and creating new ones, as well as about cluster-level services (e.g. DNS): |
472 |
| - - [Dependency name] |
473 |
| - - Usage description: |
474 |
| - - Impact of its outage on the feature: |
475 |
| - - Impact of its degraded performance or high-error rates on the feature: |
| 480 | + No, outside of depending on the scheduler, the garbage collector and volume |
| 481 | + management (provisioning, attaching, etc) as does almost anything in |
| 482 | + Kubernetes. This feature does not add any new dependencies that did not |
| 483 | + already exist with the stateful set controller. |
476 | 484 |
|
477 | 485 |
|
478 | 486 | ### Scalability
|
@@ -517,28 +525,32 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
|
517 | 525 |
|
518 | 526 | ### Troubleshooting
|
519 | 527 |
|
520 |
| -_TBD on beta graduation._ |
521 |
| - |
522 |
| -The Troubleshooting section currently serves the `Playbook` role. We may consider |
523 |
| -splitting it into a dedicated `Playbook` document (potentially with some monitoring |
524 |
| -details). For now, we leave it here. |
525 |
| - |
526 | 528 | * **How does this feature react if the API server and/or etcd is unavailable?**
|
527 | 529 |
|
| 530 | +PVC deletion will be paused. If the control plane went unavailable in the middle |
| 531 | +of a stateful set being deleted or scaled down, there may be deleted Pods whose |
| 532 | +PVCs have not yet been deleted. Deletion will continue normally after the |
| 533 | +control plane returns. |
| 534 | + |
528 | 535 | * **What are other known failure modes?**
|
529 |
| - For each of them, fill in the following information by copying the below template: |
530 |
| - - [Failure mode brief description] |
531 |
| - - Detection: How can it be detected via metrics? Stated another way: |
532 |
| - how can an operator troubleshoot without logging into a master or worker node? |
533 |
| - - Mitigations: What can be done to stop the bleeding, especially for already |
534 |
| - running user workloads? |
535 |
| - - Diagnostics: What are the useful log messages and their required logging |
536 |
| - levels that could help debug the issue? |
537 |
| - Not required until feature graduated to beta. |
538 |
| - - Testing: Are there any tests for failure mode? If not, describe why. |
| 536 | + - PVCs from a stateful set not being deleted as expected. |
| 537 | + - Detection: This can be deteted by lower than expected counts of the |
| 538 | + `statefulset_pvcs_owned_by_*` metrics and by an operator listing and examining PVCs. |
| 539 | + - Mitigations: We expect this to happen only if there are other, |
| 540 | + operator-installed, controllers that are also managing owner refs on |
| 541 | + PVCs. Any such PVCs can be deleted manually. The conflicting controllers |
| 542 | + will have to be manually discovered. |
| 543 | + - Diagnostics: Logs from kube-controller-manager and stateful set controller. |
| 544 | + - Testing: Tests are in place for confirming owner refs are added by the |
| 545 | + `StatefulSet` controller, but Kubernetes does not test against external |
| 546 | + custom controller. |
539 | 547 |
|
540 | 548 | * **What steps should be taken if SLOs are not being met to determine the problem?**
|
541 | 549 |
|
| 550 | +Stateful set SLOs are new with this feature and are in process of being |
| 551 | +evaluated. If they are not being met, the kube-controller-manager (where the |
| 552 | +stateful set controller lives) should be examined and/or restarted. |
| 553 | + |
542 | 554 | [supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
|
543 | 555 | [existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
|
544 | 556 |
|
|
0 commit comments