Skip to content

Commit ecf3053

Browse files
committed
promote KEP-4017 to stable
Signed-off-by: Alay Patel <[email protected]>
1 parent cccb695 commit ecf3053

File tree

3 files changed

+220
-32
lines changed

3 files changed

+220
-32
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 4017
22
beta:
3+
approver: "@wojtek-t"
4+
stable:
35
approver: "@wojtek-t"

keps/sig-apps/4017-pod-index-label/README.md

Lines changed: 212 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -132,20 +132,20 @@ checklist items _must_ be updated for the enhancement to be released.
132132

133133
Items marked with (R) are required *prior to targeting to a milestone / release*.
134134

135-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
136-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
135+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
136+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
137137
- [X] (R) Design details are appropriately documented
138138
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
139-
- [ ] e2e Tests for all Beta API Operations (endpoints)
140-
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
141-
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
139+
- [x] e2e Tests for all Beta API Operations (endpoints)
140+
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
141+
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
142142
- [X] (R) Graduation criteria is in place
143-
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
144-
- [ ] (R) Production readiness review completed
145-
- [ ] (R) Production readiness review approved
146-
- [ ] "Implementation History" section is up-to-date for milestone
147-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
148-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
143+
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
144+
- [x] (R) Production readiness review completed
145+
- [x] (R) Production readiness review approved
146+
- [x] "Implementation History" section is up-to-date for milestone
147+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
148+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
149149

150150
<!--
151151
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -238,8 +238,8 @@ At a high level, the proposal is to modify the StatefulSet and Job controllers t
238238
as a pod label at pod creation time (for jobs, this would only apply to jobs in
239239
Indexed completion mode). The details of this are outlined in the Design Details section below.
240240

241-
StatefulSet pod label: `statefulset.kubernetes.io/pod-index`
242-
Indexed Job pod label: `batch.kubernetes.io/job-completion-index` (same as existing annotation)
241+
- StatefulSet pod label: `apps.kubernetes.io/pod-index`
242+
- IndexedJob pod label: `batch.kubernetes.io/job-completion-index` (same as existing annotation)
243243

244244
### User Stories (Optional)
245245

@@ -256,7 +256,7 @@ As a user, I would like to lookup a job's pod logs by its index.
256256

257257
#### Story 2
258258
As a user, I would like to target traffic to a specific pod index (e.g., index 0) in a StatefulSet
259-
or Indexed Job. Instead of creating a service which matche an entire Job, I'd like to create a
259+
or Indexed Job. Instead of creating a service which matches an entire Job, I'd like to create a
260260
service which matches only the "head" pod, which will be more performant, especially for a large
261261
number of pods.
262262

@@ -311,7 +311,7 @@ change are understandable. This may include API specs (though not always
311311
required) or even code snippets. If there's any ambiguity about HOW your
312312
proposal will be implemented, this is the place to discuss them.
313313
-->
314-
The StatefulSet controller will only need a minor update to the [newStatefulSetPod](https://github.com/kubernetes/kubernetes/blob/fb5e9ef3b2a6f2136d54868187431c345e59f55f/pkg/controller/statefulset/stateful_set_utils.go#L458) function, to set the pod ordinal as the label `statefulset.kubernetes.io/pod-index`. This call is downstream from the [newVersionedStatefulSetPod](https://github.com/kubernetes/kubernetes/blob/release-1.27/pkg/controller/statefulset/stateful_set_control.go#LL416C7-L416C7) call, which generates
314+
The StatefulSet controller will only need a minor update to the [newStatefulSetPod](https://github.com/kubernetes/kubernetes/blob/fb5e9ef3b2a6f2136d54868187431c345e59f55f/pkg/controller/statefulset/stateful_set_utils.go#L458) function, to set the pod ordinal as the label `apps.kubernetes.io/pod-index`. This call is downstream from the [newVersionedStatefulSetPod](https://github.com/kubernetes/kubernetes/blob/release-1.27/pkg/controller/statefulset/stateful_set_control.go#LL416C7-L416C7) call, which generates
315315
the StatefulSet pods before creating them as necessary in [CreateStatefulPod](https://github.com/kubernetes/kubernetes/blob/release-1.27/pkg/controller/statefulset/stateful_set_control.go#L433).
316316

317317
Similarly, the Job controller would need to add the completion index as a label [here](https://github.com/kubernetes/kubernetes/blob/release-1.27/pkg/controller/job/job_controller.go#L1480)
@@ -330,7 +330,7 @@ when drafting this test plan.
330330
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
331331
-->
332332

333-
[X] I/we understand the owners of the involved components may require updates to
333+
[x] I/we understand the owners of the involved components may require updates to
334334
existing tests to make this code solid enough prior to committing the changes necessary
335335
to implement this enhancement.
336336

@@ -361,8 +361,8 @@ https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
361361
This can inform certain test coverage improvements that we want to do before
362362
extending the production code to implement this enhancement.
363363
-->
364-
- `k8s.io/kubernetes/pkg/controller/job`: `05/18/2023` - `90.4%`
365-
- `k8s.io/kubernetes/pkg/controller/statefulset`: `05/18/2023` - `85.7%`
364+
- `k8s.io/kubernetes/pkg/controller/job`: `10/09/2024` - `92%`
365+
- `k8s.io/kubernetes/pkg/controller/statefulset`: `10/09/2024` - `85.6%`
366366

367367
##### Integration tests
368368

@@ -381,8 +381,7 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
381381
https://storage.googleapis.com/k8s-triage/index.html
382382
-->
383383

384-
Unit tests will ensure the new label is correctly added to pods, and
385-
integration tests will verify that the label is only added to pods from newly created StatefulSets and Indexed Jobs, not existing workloads.
384+
- Existing Integration will be updated as a criteria for GA
386385

387386
##### e2e tests
388387

@@ -396,8 +395,7 @@ https://storage.googleapis.com/k8s-triage/index.html
396395
We expect no non-infra related flakes in the last month as a GA graduation criteria.
397396
-->
398397

399-
E2E tests will not provide any additional coverage that isn't already covered by unit + integration tests,
400-
since we are simply adding a label, so no e2e tests will be necessary for this change.
398+
The e2e test check for value of the label: https://github.com/kubernetes/kubernetes/blob/d9c46d8ecb1ede9be30545c9803e17682fcc4b50/test/e2e/apps/job.go#L435-L467
401399

402400
### Graduation Criteria
403401

@@ -411,7 +409,9 @@ existing label which other things may depend on, for example).
411409
- Docs are clear about what happens if two pods get the same value (it is set by workload controllers, nothing in the API system will prevent collisions from happening).
412410

413411
#### GA
414-
Fix any potentially reported bugs.
412+
- the PodIndexLabel feature-gate will be locked and the code will ignore it
413+
- Add integration/e2e test for StatefulSet controller, `PodIndexLabel` feature
414+
- Update existing integration test for IndexedJob to validate the label value
415415

416416
<!--
417417
**Note:** Generally we also wait at least two releases between beta and
@@ -467,7 +467,7 @@ enhancement:
467467
N/A. This feature doesn't require coordination between control plane components,
468468
the changes to each controller are self-contained.
469469

470-
If there were version skew between the control plane components and the node components, where the control plane components were at version N where this feature exists, and the node componets were at version N-1 where this feature does not exist, there would be no adverse affects, the new label would simply be added to StatefulSet/Indexed Job pods.
470+
If there were version skew between the control plane components and the node components, where the control plane components were at version N where this feature exists, and the node components were at version N-1 where this feature does not exist, there would be no adverse affects, the new label would simply be added to StatefulSet/Indexed Job pods.
471471

472472
## Production Readiness Review Questionnaire
473473

@@ -529,7 +529,7 @@ well as the [existing list] of feature gates.
529529
Any change of default behavior may be surprising to users or break existing
530530
automations, so be extremely careful here.
531531
-->
532-
Yes, a new label is added to pods created for StatefulSet (statefulset.kubernetes.io/pod-index) and Indexed Jobs (batch.kubernetes.io/job-completion-index)
532+
Yes, a new label is added to pods created for StatefulSet (apps.kubernetes.io/pod-index) and Indexed Jobs (batch.kubernetes.io/job-completion-index)
533533

534534
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
535535

@@ -615,7 +615,189 @@ Describe manual testing that was done and the outcomes.
615615
Longer term, we may want to require automated upgrade/rollback tests, but we
616616
are missing a bunch of machinery and tooling and can't do that now.
617617
-->
618-
It will be tested manually prior to beta launch.
618+
619+
For StatefulSet
620+
621+
1. kind kubernetes 1.31 cluster was created
622+
```
623+
# k version
624+
Client Version: v1.30.3
625+
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
626+
Server Version: v1.31.0
627+
```
628+
2. A sample statefulset was created, since default value feature gate is PodIndexLabel is true, the pods had following labels:
629+
```
630+
# k get pods -oyaml | grep ' name: example-statefulset-\|index'
631+
apps.kubernetes.io/pod-index: "0"
632+
name: example-statefulset-0
633+
apps.kubernetes.io/pod-index: "1"
634+
name: example-statefulset-1
635+
apps.kubernetes.io/pod-index: "2"
636+
name: example-statefulset-2
637+
```
638+
3. The controller-manager yaml was modified to disable the feature gate, for testing downgrades:
639+
```
640+
# k logs -f -n kube-system kube-controller-manager-kind-1.31-dra-control-plane | grep feature
641+
I1008 21:12:12.361613 1 flags.go:64] FLAG: --feature-gates=":DynamicResourceAllocation=true,:PodIndexLabel=false"
642+
I1008 21:12:30.602829 1 controllermanager.go:749] "Controller is disabled by a feature gate" controller="storageversion-garbage-collector-controller" requiredFeatureGates=["APIServerIdentity","StorageVersionAPI"]
643+
I1008 21:12:30.653581 1 controllermanager.go:749] "Controller is disabled by a feature gate" controller="service-cidr-controller" requiredFeatureGates=["MultiCIDRServiceAllocator"]
644+
```
645+
The controller did not re-write the pod labels, as expected
646+
```
647+
# k get pods -oyaml | grep ' name: example-statefulset-\|index'
648+
apps.kubernetes.io/pod-index: "0"
649+
name: example-statefulset-0
650+
apps.kubernetes.io/pod-index: "1"
651+
name: example-statefulset-1
652+
apps.kubernetes.io/pod-index: "2"
653+
name: example-statefulset-2
654+
```
655+
4. The statefulset was deleted and re-created, pods were created without the index label
656+
```
657+
# k get pods -oyaml | grep ' name: example-statefulset-\|index'
658+
name: example-statefulset-0
659+
name: example-statefulset-1
660+
name: example-statefulset-2
661+
```
662+
5. The controller-manager yaml was modified to enable the feature gate, for testing upgrade
663+
```
664+
# k logs -f -n kube-system kube-controller-manager-kind-1.31-dra-control-plane | grep feature
665+
I1008 21:14:46.348747 1 flags.go:64] FLAG: --feature-gates=":DynamicResourceAllocation=true"
666+
```
667+
The controller-manager did not update the labels
668+
```
669+
# k get pods -oyaml | grep ' name: example-statefulset-\|index'
670+
name: example-statefulset-0
671+
name: example-statefulset-1
672+
name: example-statefulset-2
673+
```
674+
6. The statefulset was deleted and re-created, pods were created with the index label
675+
```
676+
# k get pods -oyaml | grep ' name: example-statefulset-\|index'
677+
apps.kubernetes.io/pod-index: "0"
678+
name: example-statefulset-0
679+
apps.kubernetes.io/pod-index: "1"
680+
name: example-statefulset-1
681+
apps.kubernetes.io/pod-index: "2"
682+
name: example-statefulset-2
683+
```
684+
685+
For IndexedJob
686+
687+
1. kind kubernetes 1.31 cluster was created
688+
```
689+
# k version
690+
Client Version: v1.30.3
691+
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
692+
Server Version: v1.31.0
693+
```
694+
2. A sample IndexedJob was created, since default value feature gate is PodIndexLabel is true, the pods had following labels:
695+
```
696+
# k get pods -oyaml | grep ' name: sample-indexed-job-[0-9]\|job-completion-index'
697+
batch.kubernetes.io/job-completion-index: "0"
698+
batch.kubernetes.io/job-completion-index: "0"
699+
name: sample-indexed-job-0-8sgb7
700+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
701+
batch.kubernetes.io/job-completion-index: "1"
702+
batch.kubernetes.io/job-completion-index: "1"
703+
name: sample-indexed-job-1-f9mz4
704+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
705+
batch.kubernetes.io/job-completion-index: "2"
706+
batch.kubernetes.io/job-completion-index: "2"
707+
name: sample-indexed-job-2-5gxwz
708+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
709+
```
710+
3. The controller-manager yaml was modified to disable the feature gate, for testing downgrades:
711+
```
712+
# k logs -f -n kube-system kube-controller-manager-kind-1.31-dra-control-plane | grep feature
713+
I1010 02:33:21.331424 1 flags.go:64] FLAG: --feature-gates=":DynamicResourceAllocation=true,:PodIndexLabel=false"
714+
```
715+
The controller did not re-write the pod labels, as expected
716+
```
717+
# k get pods -oyaml | grep ' name: sample-indexed-job-[0-9]\|job-completion-index'
718+
batch.kubernetes.io/job-completion-index: "0"
719+
batch.kubernetes.io/job-completion-index: "0"
720+
name: sample-indexed-job-0-8sgb7
721+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
722+
batch.kubernetes.io/job-completion-index: "1"
723+
batch.kubernetes.io/job-completion-index: "1"
724+
name: sample-indexed-job-1-f9mz4
725+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
726+
batch.kubernetes.io/job-completion-index: "2"
727+
batch.kubernetes.io/job-completion-index: "2"
728+
name: sample-indexed-job-2-5gxwz
729+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
730+
```
731+
4. The IndexedJob was deleted and re-created, pods were created without the index label (some of the output is truncated
732+
for brevity)
733+
```
734+
# k get pods -oyaml | grep -A 4 ' name: sample-indexed-job-[0-9]\|labels'
735+
labels:
736+
batch.kubernetes.io/controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
737+
batch.kubernetes.io/job-name: sample-indexed-job
738+
controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
739+
job-name: sample-indexed-job
740+
name: sample-indexed-job-0-8ttb5
741+
--
742+
labels:
743+
batch.kubernetes.io/controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
744+
batch.kubernetes.io/job-name: sample-indexed-job
745+
controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
746+
job-name: sample-indexed-job
747+
name: sample-indexed-job-1-tvjqc
748+
--
749+
labels:
750+
batch.kubernetes.io/controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
751+
batch.kubernetes.io/job-name: sample-indexed-job
752+
controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
753+
job-name: sample-indexed-job
754+
name: sample-indexed-job-2-r75jw
755+
```
756+
5. The controller-manager yaml was modified to enable the feature gate, for testing upgrade
757+
```
758+
# k logs -f -n kube-system kube-controller-manager-kind-1.31-dra-control-plane | grep feature
759+
I1010 02:39:22.329026 1 flags.go:64] FLAG: --feature-gates=":DynamicResourceAllocation=true,:PodIndexLabel=true"
760+
```
761+
The controller-manager did not update the labels
762+
```
763+
# k get pods -oyaml | grep -A 4 ' name: sample-indexed-job-[0-9]\|labels'
764+
labels:
765+
batch.kubernetes.io/controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
766+
batch.kubernetes.io/job-name: sample-indexed-job
767+
controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
768+
job-name: sample-indexed-job
769+
name: sample-indexed-job-0-8ttb5
770+
--
771+
labels:
772+
batch.kubernetes.io/controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
773+
batch.kubernetes.io/job-name: sample-indexed-job
774+
controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
775+
job-name: sample-indexed-job
776+
name: sample-indexed-job-1-tvjqc
777+
--
778+
labels:
779+
batch.kubernetes.io/controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
780+
batch.kubernetes.io/job-name: sample-indexed-job
781+
controller-uid: bf96f9c0-b7ec-4c7e-9a4c-9cca20b26d35
782+
job-name: sample-indexed-job
783+
name: sample-indexed-job-2-r75jw
784+
```
785+
6. The IndexedJob was deleted and re-created, pods were created with the index label
786+
```
787+
# k get pods -oyaml | grep ' name: sample-indexed-job-[0-9]\|job-completion-index'
788+
batch.kubernetes.io/job-completion-index: "0"
789+
batch.kubernetes.io/job-completion-index: "0"
790+
name: sample-indexed-job-0-d7d7m
791+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
792+
batch.kubernetes.io/job-completion-index: "1"
793+
batch.kubernetes.io/job-completion-index: "1"
794+
name: sample-indexed-job-1-gg9sv
795+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
796+
batch.kubernetes.io/job-completion-index: "2"
797+
batch.kubernetes.io/job-completion-index: "2"
798+
name: sample-indexed-job-2-nfxlr
799+
fieldPath: metadata.labels['batch.kubernetes.io/job-completion-index']
800+
```
619801

620802
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
621803

@@ -641,7 +823,7 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
641823
checking if there are objects with field X set) may be a last resort. Avoid
642824
logs or events for this purpose.
643825
-->
644-
- Check if StatefulSet pods have the label `statefulset.kubernetes.io/pod-index`.
826+
- Check if StatefulSet pods have the label `apps.kubernetes.io/pod-index`.
645827
- Check if Indexed Job pods have the label `batch.kubernetes.io/job-completion-index`.
646828

647829
###### How can someone using this feature know that it is working for their instance?
@@ -660,7 +842,7 @@ Recall that end users cannot usually observe component logs or access metrics.
660842
- [X] API .metadata
661843
- Condition name:
662844
- Other field:
663-
- `.metadata.labels['statefulset.kubernetes.io/pod-index']` for StatefulSets
845+
- `.metadata.labels['apps.kubernetes.io/pod-index']` for StatefulSets
664846
- `.metadata.labels['batch.kubernetes.io/job-completion-index']` for Indexed Jobs
665847
- [ ] Other (treat as last resort)
666848
- Details:
@@ -882,6 +1064,8 @@ Major milestones might include:
8821064
- when the KEP was retired or superseded
8831065
-->
8841066
- 2023-05-17: KEP published
1067+
- 2023-07-14: Feature merged with feature gate in beta
1068+
- 2024-10-09: Feature graduated to GA
8851069

8861070
## Drawbacks
8871071

0 commit comments

Comments
 (0)