Skip to content

Commit 3a1a7d6

Browse files
committed
Add detail to KEP-3335 beta graduation update
1 parent cd7a196 commit 3a1a7d6

File tree

3 files changed

+116
-42
lines changed

3 files changed

+116
-42
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 3335
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-apps/3335-statefulset-slice/README.md

Lines changed: 114 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ should be approved by the remaining approvers and/or the owning SIG (or
5959
SIG Architecture for cross-cutting KEPs).
6060
-->
6161

62-
# KEP-3335: StatefulSet Slice
62+
# KEP-3335: StatefulSet Start Ordinal
6363

6464
<!--
6565
This is the title of your KEP. Keep it short, simple, and descriptive. A good
@@ -94,9 +94,11 @@ tags, and then generate with `hack/update-toc.sh`.
9494
- [Test Plan](#test-plan)
9595
- [Prerequisite testing updates](#prerequisite-testing-updates)
9696
- [Unit tests](#unit-tests)
97-
- [e2e/Integration tests](#e2eintegration-tests)
97+
- [E2E tests](#e2e-tests)
98+
- [Integration tests](#integration-tests)
9899
- [Graduation Criteria](#graduation-criteria)
99100
- [Alpha](#alpha)
101+
- [Beta](#beat)
100102
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
101103
- [Version Skew Strategy](#version-skew-strategy)
102104
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -133,7 +135,7 @@ checklist items _must_ be updated for the enhancement to be released.
133135
Items marked with (R) are required *prior to targeting to a milestone / release*.
134136

135137
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
136-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
138+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
137139
- [X] (R) Design details are appropriately documented
138140
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
139141
- [ ] e2e Tests for all Beta API Operations (endpoints)
@@ -143,8 +145,8 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
143145
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
144146
- [ ] (R) Production readiness review completed
145147
- [ ] (R) Production readiness review approved
146-
- [ ] "Implementation History" section is up-to-date for milestone
147-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
148+
- [X] "Implementation History" section is up-to-date for milestone
149+
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
148150
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
149151

150152
<!--
@@ -201,7 +203,7 @@ This feature is motivated by the use case of orchestrating the migration of a St
201203
namespace or a Kubernetes cluster without disruption. Existing approaches to
202204
this problem include:
203205

204-
1. Back and restore: This approach takes a backup of an application (StatefulSet, underlying storage), and re-creates it in a different location. This introduces application downtime, the duration of time between old StatefulSet termination and new StatefulSet recreation.
206+
1. Backup and restore: This approach takes a backup of an application (StatefulSet, underlying storage), and re-creates it in a different location. This introduces application downtime, the duration of time between old StatefulSet termination and new StatefulSet recreation.
205207
2. Pod level migration: Using `--cascade=orphan` when deleting a StatefulSet preserves the pods. This allows an application operator to evict and reschedule pods individually. However, as pods are ephemeral, this requires the application operator to emulate the behavior of the StatefulSet, to reschedule pods as they restart, or are evicted and rescheduled.
206208

207209
Migrating a StatefulSet in slices allows for gradual migration of the application, as only a subset of replicas are migrated at any time. Consider the scenario of transferring pod ordinal ownership from a source StatefulSet with `N` pods to a destination StatefulSet with `0` pods. Further, to maintain application availability, no more than `d` pods should be unavailable at any time during the transfer. An orchestrator can manipulate `.spec.replicas` and `.spec.ordinals.start` to perform this migration:
@@ -435,10 +437,11 @@ This can inform certain test coverage improvements that we want to do before
435437
extending the production code to implement this enhancement.
436438
-->
437439

438-
* `pkg/controller/statefulset/stateful_set_control_test.go - Tests that a StatefulSet slice can be created from specified starting ordinal`
439-
* `pkg/apis/apps/v1/defaults_test.go - Tests defaults for new fields added to StatefulSet`
440+
* `pkg/apis/apps/validation/validation_test.go` - Tests that the .spec.ordinals.start value is properly validated.
441+
* `pkg/controller/statefulset/stateful_set_control_test.go` - Tests that a StatefulSet slice can be created from specified starting ordinal.
442+
* `pkg/registry/apps/statefulset/strategy_test.go` - Tests the create/update strategy of a StatefulSet with start ordinals. Also validates enablement/disablement of the feature.
440443

441-
##### e2e/Integration tests
444+
##### E2E tests
442445

443446
<!--
444447
This question should be filled when targeting a release.
@@ -448,12 +451,18 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
448451
https://storage.googleapis.com/k8s-triage/index.html
449452
-->
450453

454+
`Feature:StatefulSetStartOrdinal` in `k8s.io/kubernetes/test/e2e/apps/`.
455+
456+
* Adding `ordinals.start`: Validate that setting `ordinals.start` to `k` causes StatefulSet ordinals to be scaled (pods `[0, k-1]` are terminated, pods `[N, N+k-1]` are created)
457+
* Increasing `ordinals.start`: Validate that increasing `ordinals.start` from `m` to `n` causes StatefulSet ordinals to be scaled (pods `[m, n-1]` are terminated, pods `[m+N, n+N-1]` are created)
458+
* Removing `ordinals.start`: Validate that setting `ordinals.start` causes StatefulSet ordinals to be scaled (pods `[N-1, N+k-1]` are terminated, pods `[0, k-1]` are created)
459+
* Decreasing `ordinals.start`: Validate that decreasing `ordinals.start` from `m` to `n` causes StatefulSet ordinals to be scaled (pods `[m+N, n+N-1]` are terminated, pods `[m, n-1]` are created)
460+
461+
#### Integration tests
462+
463+
`StatefulSetStartOrdinal` in `k8s.io/kubernetes/test/integration/statefulset`.
464+
451465
* Pod Restart Tests: Validate that StatefulSet RollingUpdate behavior is preserved, with an replica ordinal offset starting at `ordinals.start`
452-
* Scaling Tests
453-
* Adding `ordinals.start`: Validate that setting `ordinals.start` to `k` causes StatefulSet ordinals to be scaled (pods `[0, k-1]` are terminated, pods `[N, N+k-1]` are created)
454-
* Increasing `ordinals.start`: Validate that increasing `ordinals.start` from `m` to `n` causes StatefulSet ordinals to be scaled (pods `[m, n-1]` are terminated, pods `[m+N, n+N-1]` are created)
455-
* Removing `ordinals.start`: Validate that setting `ordinals.start` causes StatefulSet ordinals to be scaled (pods `[N-1, N+k-1]` are terminated, pods `[0, k-1]` are created)
456-
* Decreasing `ordinals.start`: Validate that decreasing `ordinals.start` from `m` to `n` causes StatefulSet ordinals to be scaled (pods `[m+N, n+N-1]` are terminated, pods `[m, n-1]` are created)
457466

458467
### Graduation Criteria
459468

@@ -487,7 +496,7 @@ Below are some examples to consider, in addition to the aforementioned [maturity
487496
#### Alpha
488497
489498
- Feature implemented behind a feature flag
490-
- Initial e2e tests completed and enabled
499+
- Add unitInitial e2e tests completed and enabled
491500
492501
#### Beta
493502
@@ -522,8 +531,12 @@ in back-to-back releases.
522531
#### Alpha
523532

524533
* Feature functionality implemented but hidden behind a feature gate
525-
* Add unit, functional, upgrade and downgrade tests to automated k8s test.
534+
* Add unit, e2e and functional tests to automated k8s test.
526535

536+
#### Beta
537+
538+
* Validate with user workloads
539+
* Enable feature gate for e2e pipelines
527540

528541
### Upgrade / Downgrade Strategy
529542

@@ -658,14 +671,14 @@ You can take a look at one potential example of such test in:
658671
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
659672
-->
660673

661-
Additional e2e tests will be added when targeting the Beta stage. These will
662-
validate the behavior of the cluster when enabled and disabled and ensure that
663-
existing behavior (eg: not specifying the new `ordinals.start` API) is
674+
Existing e2e tests will validate that when the feature is enabled, but not in
675+
use that the existing behavior (eg: not specifying the new `ordinals.start` API) is
664676
preserved.
665677

666-
### Rollout, Upgrade and Rollback Planning
678+
Additionally unit tests for validating enablement/disablement will be added in
679+
Beta.
667680

668-
TBD upon graduation to beta.
681+
### Rollout, Upgrade and Rollback Planning
669682

670683
<!--
671684
This section must be completed when targeting beta to a release.
@@ -683,13 +696,29 @@ rollout. Similarly, consider large clusters and how enablement/disablement
683696
will rollout across nodes.
684697
-->
685698

699+
If a control plane rollout disables this feature, the StatefulSet controller
700+
will update ordinal numbers it controls. This will result in pods being deleted,
701+
while other pods are scaled in. The StatefulSet controller scales up pods before
702+
it deletes pods, so as a result, the StatefulSet should not manage fewer than the
703+
number of replicas that are defined in the spec. Disabling the feature may have an effect on the
704+
stateful workload that is being run. If the stateful application expects a
705+
specific ordinal number to be available, it may result in an application failing
706+
to reach quorum, or rebalancing data based on the number of available replicas.
707+
686708
###### What specific metrics should inform a rollback?
687709

688710
<!--
689711
What signals should users be paying attention to when the feature is young
690712
that might indicate a serious problem?
691713
-->
692714

715+
The `kube_statefulset_status_replicas` metric can be monitored against the
716+
`kube_statefulset_replicas` metric to check the expected number of replicas to
717+
the actual number of pods matched by this StatefulSet's selector. If there is
718+
a divergence between these fields during steady state operations, this can
719+
indicate that the number of replicas being created by the StatefulSet do not
720+
match the expected number of replicas.
721+
693722
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
694723

695724
<!--
@@ -704,9 +733,10 @@ are missing a bunch of machinery and tooling and can't do that now.
704733
Even if applying deprecation policies, they may still surprise some users.
705734
-->
706735

707-
### Monitoring Requirements
736+
No removals or deprecations are tied to this rollout. The rollout is enabled by
737+
the feature flag `StatefulSetStartOrdinal`.
708738

709-
TBD upon graduation to beta.
739+
### Monitoring Requirements
710740

711741
<!--
712742
This section must be completed when targeting beta to a release.
@@ -723,6 +753,11 @@ checking if there are objects with field X set) may be a last resort. Avoid
723753
logs or events for this purpose.
724754
-->
725755

756+
An operator can check the `.spec.ordinals.start` metric on the StatefulSet to
757+
determine if this StatefulSet has a non-default start ordinal defined. The
758+
operator can also check if the `statefulset_ordinals_start` metric is set. A
759+
non-zero value indicates it is in use.
760+
726761
###### How can someone using this feature know that it is working for their instance?
727762

728763
<!--
@@ -734,13 +769,9 @@ and operation of this feature.
734769
Recall that end users cannot usually observe component logs or access metrics.
735770
-->
736771

737-
- [ ] Events
738-
- Event Reason:
739-
- [ ] API .status
740-
- Condition name:
741-
- Other field:
742772
- [ ] Other (treat as last resort)
743-
- Details:
773+
- Details: The user can inspect the pods that are created by the StatefulSet
774+
which match the StatefulSet's selector.
744775

745776
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
746777

@@ -759,18 +790,32 @@ These goals will help you determine what you need to measure (SLIs) in the next
759790
question.
760791
-->
761792

793+
The `statefulset_reconcile_delay` metric (time between StatefulSet reconciliation
794+
loops) should not significantly increase when using this feature.
795+
796+
For checking correctness, the `kube_statefulset_status_replicas` metric can be
797+
compared against the `kube_statefulset_replicas` metric to check the expected
798+
number of replicas to the actual number of pods matched by this StatefulSet's
799+
selector. Under steady state, these two fields should be equal. Note that these
800+
two metrics can diverge if application replicas don't start up for other reasons
801+
(eg: StatefulSet is using `PodManagementPolicy: OrderedReady`, and pod-`k`
802+
doesn't become ready, preventing pod-`k+1` from being created).
803+
762804
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
763805

764806
<!--
765807
Pick one more of these and delete the rest.
766808
-->
767809

768-
- [ ] Metrics
769-
- Metric name:
770-
- [Optional] Aggregation method:
771-
- Components exposing the metric:
772-
- [ ] Other (treat as last resort)
773-
- Details:
810+
- Metric name: `statefulset_reconcile_delay`
811+
- [Optional] Aggregation method: `quantile`
812+
- Components exposing the metric: `pkg/controller/statefulset`
813+
- Metric name: `kube_statefulset_replicas`
814+
- [Optional] Aggregation method: `gauge`
815+
- Components exposing the metric: `pkg/controller/statefulset`
816+
- Metric name: `kube_statefulset_status_replicas`
817+
- [Optional] Aggregation method: `gauge`
818+
- Components exposing the metric: `pkg/controller/statefulset`
774819

775820
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
776821

@@ -779,9 +824,9 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
779824
implementation difficulties, etc.).
780825
-->
781826

782-
### Dependencies
827+
No.
783828

784-
TBD upon graduation to beta.
829+
### Dependencies
785830

786831
<!--
787832
This section must be completed when targeting beta to a release.
@@ -804,9 +849,11 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
804849
- Impact of its degraded performance or high-error rates on the feature:
805850
-->
806851

807-
### Scalability
852+
This feature depends on API Server to determine the health of a pod, in order
853+
control pods with particular ordinal numbers. There are no other external
854+
dependencies.
808855

809-
TBD upon graduation to beta.
856+
### Scalability
810857

811858
<!--
812859
For alpha, this section is encouraged: reviewers should consider these questions
@@ -833,6 +880,8 @@ Focusing mostly on:
833880
heartbeats, leader election, etc.)
834881
-->
835882

883+
No.
884+
836885
###### Will enabling / using this feature result in introducing new API types?
837886

838887
<!--
@@ -842,6 +891,8 @@ Describe them, providing:
842891
- Supported number of objects per namespace (for namespace-scoped objects)
843892
-->
844893

894+
No.
895+
845896
###### Will enabling / using this feature result in any new calls to the cloud provider?
846897

847898
<!--
@@ -850,6 +901,8 @@ Describe them, providing:
850901
- Estimated increase:
851902
-->
852903

904+
No.
905+
853906
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
854907

855908
<!--
@@ -859,6 +912,9 @@ Describe them, providing:
859912
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
860913
-->
861914

915+
Yes, StatefulSet adds an additional `.spec.ordinals` field. If set, this adds a
916+
nested integer, `.spec.ordinals.start`.
917+
862918
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
863919

864920
<!--
@@ -870,6 +926,8 @@ Think about adding additional work or introducing new steps in between
870926
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
871927
-->
872928

929+
No. The runtime for pod control loop remains the same with this feature.
930+
873931
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
874932

875933
<!--
@@ -882,9 +940,9 @@ This through this both in small and large cases, again with respect to the
882940
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
883941
-->
884942

885-
### Troubleshooting
943+
No. Resource usage remains the same with this feature.
886944

887-
TBD upon graduation to beta.
945+
### Troubleshooting
888946

889947
<!--
890948
This section must be completed when targeting beta to a release.
@@ -899,6 +957,12 @@ details). For now, we leave it here.
899957

900958
###### How does this feature react if the API server and/or etcd is unavailable?
901959

960+
In the event of API server/etcd unavailability, the StatefulSet control loop will
961+
be unable to list pod resources. This will prevent the control loop from being
962+
able to reconcile pod resources in the cluster. When API server
963+
and etcd become available again, the control loop will adjust to reconcile
964+
resources, according to the `.spec.ordinals.start` and `.spec.replicas` fields.
965+
902966
###### What are other known failure modes?
903967

904968
<!--
@@ -914,8 +978,13 @@ For each of them, fill in the following information by copying the below templat
914978
- Testing: Are there any tests for failure mode? If not, describe why.
915979
-->
916980

981+
No other failure modes are known.
982+
917983
###### What steps should be taken if SLOs are not being met to determine the problem?
918984

985+
If the StatefulSet SLOs are not met, the kube-controller-manager should be
986+
restarted or examined/debugged.
987+
919988
## Implementation History
920989

921990
<!--
@@ -929,6 +998,10 @@ Major milestones might include:
929998
- when the KEP was retired or superseded
930999
-->
9311000

1001+
- 1.26, KEP created.
1002+
- 1.26, alpha implementation.
1003+
- 1.27, beta implementation.
1004+
9321005
## Drawbacks
9331006

9341007
<!--

keps/sig-apps/3335-statefulset-slice/kep.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,3 @@ disable-supported: true
3838

3939
# The following PRR answers are required at beta release
4040
metrics:
41-
- kube_statefulset_ordinal_start

0 commit comments

Comments
 (0)