KEP-5067: promote Pod Generation to beta (#5377)

natasha41575 · web-flow · commit 482a5f7ccccb · 2025-06-17T19:56:51.000-07:00
* KEP-5067: promote Pod Generation to beta

* address kep review feedback

* update details about mirror pods

* add conformance to GA criteria

* correct detail about mirror pods

* update prr reviewer

* clarify steps if SLOs are not being met

* address prr feedback

* add metrics to kep.yaml
diff --git a/keps/prod-readiness/sig-node/5067.yaml b/keps/prod-readiness/sig-node/5067.yaml
@@ -1,3 +1,5 @@
 kep-number: 5067
 alpha:
   approver: "johnbelamaric"
+beta:
+  approver: "soltysh"
diff --git a/keps/sig-node/5067-pod-generation/README.md b/keps/sig-node/5067-pod-generation/README.md
@@ -142,20 +142,20 @@ checklist items _must_ be updated for the enhancement to be released.
 
 Items marked with (R) are required *prior to targeting to a milestone / release*.
 
-* [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
-* [ ] (R) KEP approvers have approved the KEP status as `implementable`
-* [ ] (R) Design details are appropriately documented
-* [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
-  + [ ] e2e Tests for all Beta API Operations (endpoints)
+* [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+* [x] (R) KEP approvers have approved the KEP status as `implementable`
+* [x] (R) Design details are appropriately documented
+* [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+  + [x] e2e Tests for all Beta API Operations (endpoints)
   + [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
   + [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
-* [ ] (R) Graduation criteria is in place
+* [x] (R) Graduation criteria is in place
   + [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
 * [ ] (R) Production readiness review completed
 * [ ] (R) Production readiness review approved
-* [ ] "Implementation History" section is up-to-date for milestone
-* [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
-* [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
+* [x] "Implementation History" section is up-to-date for milestone
+* [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+* [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
 
 <!--
 **Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -457,12 +457,15 @@ regressions back to decreasing values.
 
 #### Mirror pods
 
-The kubelet currently computes the internal pod UID of a mirror pod using a hash of the podspec, 
-meaning that any update to the podspec results in the kubelet seeing it as a pod deletion followed 
-by creation of a new pod. To fully support generation for mirror pods more changes to the kubelet's logic 
-will be expected. For now, we will not treat mirror pods in any special way. This means that due
-to the way that mirror pods are implemented, the generation (and observedGeneration) of a mirror pod 
-will always be 1.
+For this KEP, we will not treat mirror pods in any special way. Due to the way they are currently implemented in the
+kubelet and apiserver, this means:
+
+1. If a mirror pod's spec is modified manually by a client via the apiserver, its `metadata.generation` will be bumped accordingly.
+1. If a static pod's manifest is updated, the kubelet treats this as a pod deletion followed by a pod creation,
+which will reset the `metadata.generation` of the corresponding mirror pod to 1.
+1. The kubelet does not currently propagate the mirror pod's `metadata.generation` to the place where
+the pod status is updated today, so the `observedGeneration` fields of mirror pods will remain
+unpopulated.
 
 #### Future enhancements
 
@@ -517,6 +520,13 @@ extending the production code to implement this enhancement.
 Unit tests will be implemented to cover code changes that implement the feature, 
 in the API server code and the kubelet code. 
 
+Core packages touched:
+* `pkg/registry/core/pod/strategy.go`: `2025-06-16` - `71.1`
+* `pkg/registry/core/pod/util.go`: `2025-06-16` - `74`
+* `pkg/apis/core/validation/validation.go`: `2025-06-16` - `84.6`
+* `pkg/kubelet`: `2025-06-16` - `71`
+* `pkg/kubelet/status`: `2025-06-16` - `86.8`
+
 ##### Integration tests
 
 <!--
@@ -552,8 +562,19 @@ E2E tests will be implemented to cover the following cases:
 * Verify that newly created pods have a `metadata.generation` set to 1.
 * Verify that PodSpec updates (such as tolerations or container images), resize requests, adding ephemeral containers, and binding requests cause the `metadata.generation` to be incremented by 1 for each update.
 * Verify that deletion of a pod causes the `metadata.generation` to be incremented by 1.
-* Issue ~1000 pod updates (1 every 100ms) and verify that `metadata.generation` and `status.observedGeneration` converge to the final expected value.
+* Issue ~500 pod updates (1 every 100ms) and verify that `metadata.generation` and `status.observedGeneration` converge to the final expected value.
 * Verify that various conditions each have `observedGeneration` populated. 
+* Verify that static pods have `metadata.generation` and `observedGeneration` fields set to 1, and that
+they never change.
+
+Added tests:
+`pod generation should start at 1 and increment per update`: SIG Node, https://storage.googleapis.com/k8s-triage/index.html?test=Pod%20Generation
+`custom-set generation on new pods and graceful delete`: SIG Node, https://storage.googleapis.com/k8s-triage/index.html?test=Pod%20Generation
+`issue 500 podspec updates and verify generation and observedGeneration eventually converge`: SIG Node, https://storage.googleapis.com/k8s-triage/index.html?test=Pod%20Generation
+`pod rejected by kubelet should have updated generation and observedGeneration`: SIG Node, https://storage.googleapis.com/k8s-triage/index.html?test=Pod%20Generation
+`pod observedGeneration field set in pod conditions`: SIG Node, https://storage.googleapis.com/k8s-triage/index.html?test=Pod%20Generation
+`pod-resize-scheduler-tests`: SIG Node, https://storage.googleapis.com/k8s-triage/index.html?test=pod-resize-scheduler-tests
+
 
 ### Graduation Criteria
 
@@ -625,16 +646,17 @@ in back-to-back releases.
 * `metadata.generation` functionality implemented
 * `status.observedGeneration` functionality implemented behind feature flag
 * `status.conditions[i].observedGeneration` field added to the API
+* `status.conditions[i].observedGeneration` functionality implemented behind feature flag
 
 #### Beta
 
 * `metadata.generation`,  `status.observedGeneration`,  `status.conditions[i].observedGeneration` functionality have been implemented and running as alpha for at least one release
-* `status.conditions[i].observedGeneration` functionality implemented behind feature flag
 
 #### GA
 
 * No major bugs reported for three months. 
-* User feedback (ideally from at least two distinct users) is green.
+* No negative user feedback.
+* Promote the [primary e2e tests](https://github.com/kubernetes/kubernetes/blob/08ee8bde594a42bc1a222c9fd25726352a1e6049/test/e2e/node/pods.go#L422-L719) to Conformance.
 
 ### Upgrade / Downgrade Strategy
 
@@ -815,6 +837,11 @@ https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05
 Unit tests will be added to cover the code that implements the feature, and will
 cover the cases of the feature gate being both enabled and disabled.
 
+The following unit test covers what happens if I disable a feature gate after having
+objects written with the new field (in this case, the field should persist).
+
+* https://github.com/kubernetes/kubernetes/blob/74210dd399c14582754e933de83a9e44b1d69c69/pkg/api/pod/util_test.go#L1228
+
 ### Rollout, Upgrade and Rollback Planning
 
 <!--
@@ -833,13 +860,30 @@ rollout. Similarly, consider large clusters and how enablement/disablement
 will rollout across nodes.
 -->
 
+A rollout or rollback won't have significant impact on any components, even
+if they restart mid-rollout. Already running workloads likewise won't be
+significantly impacted.
+
 ###### What specific metrics should inform a rollback?
 
 <!--
 What signals should users be paying attention to when the feature is young
 that might indicate a serious problem?
 -->
 
+If users see the `metadata.generation` and `status.observedGeneration` fields
+are not being updated or are significantly misaligned, that indicates that
+the feature is not working as expected.
+
+Some metrics to look at that could indicate a problem include:
+- `kubelet_pod_start_total_duration_seconds`
+- `kubelet_pod_status_sync_duration_seconds`
+- `kubelet_pod_worker_duration_seconds`
+
+You could also check the [Pod Startup Latency SLI](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md).
+
+Any of these being significantly elevated could indicate an issue with the feature.
+
 ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
 
 <!--
@@ -848,12 +892,27 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
 are missing a bunch of machinery and tooling and can't do that now.
 -->
 
+Testing steps:
+
+1. Create test pod with old version of API server and node; expected outcome: `generation` and `observedGeneration` fields are not populated
+1. Upgrade API server
+1. Send an update request to the running pod; expected outcome: `generation` is set to 1 and `observedGeneration` fields are not populated
+1. Create a new pod; expected outcome: `generation` is set to 1 and `observedGeneration` fields are not populated
+1. Create upgraded node
+1. Create second test pod on the upgraded node; expected outcome: `generation` and `observedGeneration` fields are set to 1
+1. Restart the upgraded node with the feature disabled
+1. Send an update request to the second pod; expected outcome: `generation` and `observedGeneration` continue to be updated so are set to 2
+1. Restart the upgraded node with the feature enabled
+1. Send an update request to the second pod; expected outcome: `generation` and `observedGeneration` are set to 3
+
 ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
 
 <!--
 Even if applying deprecation policies, they may still surprise some users.
 -->
 
+No.
+
 ### Monitoring Requirements
 
 <!--
@@ -871,6 +930,9 @@ checking if there are objects with field X set) may be a last resort. Avoid
 logs or events for this purpose.
 -->
 
+They can check if `metadata.generation` is set on the pod and that `observedGeneration`
+is being updated.
+
 ###### How can someone using this feature know that it is working for their instance?
 
 <!--
@@ -882,13 +944,16 @@ and operation of this feature.
 Recall that end users cannot usually observe component logs or access metrics.
 -->
 
-* [ ] Events
-  + Event Reason: 
-* [ ] API .status
-  + Condition name: 
-  + Other field: 
-* [ ] Other (treat as last resort)
-  + Details:
+* [x] API .status
+  + Other field: `metadata.generation`, `status.observedGeneration`, `status.conditions[].observedGeneration`
+
+Each pod should have its `metadata.generation` set, starting at 1 and incremented by 1 for each update.
+
+Each pod's `status.observedGeneration` should be populated to reflect the `metadata.generation` that was last
+observed by the kubelet.
+
+Each pod's `status.conditions[].observedGeneration` should be populated to reflect the `metadata.generation`
+that was last observed by the component owning the corresponding condition.
 
 ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
 
@@ -908,18 +973,15 @@ These goals will help you determine what you need to measure (SLIs) in the next
 question.
 -->
 
+We can reuse the [Pod Startup Latency SLI/SLO](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md) here.
+
 ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
 
 <!--
 Pick one more of these and delete the rest.
 -->
 
-* [ ] Metrics
-  + Metric name:
-  + [Optional] Aggregation method:
-  + Components exposing the metric:
-* [ ] Other (treat as last resort)
-  + Details:
+We can reuse the [Pod Startup Latency SLI/SLO](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md) here.
 
 ###### Are there any missing metrics that would be useful to have to improve observability of this feature?
 
@@ -928,6 +990,8 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
 implementation difficulties, etc.).
 -->
 
+N/A
+
 ### Dependencies
 
 <!--
@@ -951,6 +1015,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
       - Impact of its degraded performance or high-error rates on the feature:
 -->
 
+No.
+
 ### Scalability
 
 <!--
@@ -1076,6 +1142,9 @@ details). For now, we leave it here.
 
 ###### How does this feature react if the API server and/or etcd is unavailable?
 
+The feature depends on the API server. If the API server is unavailable, the 
+new fields will not be updated. 
+
 ###### What are other known failure modes?
 
 <!--
@@ -1094,8 +1163,21 @@ For each of them, fill in the following information by copying the below templat
     - Testing: Are there any tests for failure mode? If not, describe why.
 -->
 
+Other failure modes are described under Risks and Mitigations.
+
+Detection and mitigation of the infinite status-update loop by a badly-behaving
+admission webhook is covered in these docs: https://kubernetes.io/docs/concepts/cluster-administration/admission-webhooks-good-practices/#why-good-webhook-design-matters.
+
 ###### What steps should be taken if SLOs are not being met to determine the problem?
 
+One could disable the feature gate and restart the API server. Additionally,
+one could investigate the apiserver and/or kubelet logs errors.
+
+Detection and mitigation of the infinite status-update loop by a badly-behaving
+admission webhook is covered in [these docs](https://kubernetes.io/docs/concepts/cluster-administration/admission-webhooks-good-practices/#why-good-webhook-design-matters). Specifically,
+the section about [detecting loops caused by competing controllers](https://kubernetes.io/docs/concepts/cluster-administration/admission-webhooks-good-practices/#prevent-loops-competing-controllers)
+can be helpful.
+
 ## Implementation History
 
 <!--
@@ -1109,6 +1191,10 @@ Major milestones might include:
 * when the KEP was retired or superseded
 -->
 
+2025-01-21: initial KEP draft created
+2025-02-12: PR feedback addressed, KEP moved to "implementable" and merged
+2025-06-05: proposed promotion to beta
+
 ## Drawbacks
 
 <!--
diff --git a/keps/sig-node/5067-pod-generation/kep.yaml b/keps/sig-node/5067-pod-generation/kep.yaml
@@ -19,12 +19,12 @@ see-also:
 replaces:
 
 # The target maturity stage in the current dev cycle for this KEP.
-stage: alpha
+stage: beta
 
 # The most recent milestone for which work toward delivery of this KEP has been
 # done. This can be the current (upcoming) milestone, if it is being actively
 # worked on.
-latest-milestone: "v1.33"
+latest-milestone: "v1.34"
 
 # The milestone at which this feature was, or is targeted to be, at each stage.
 milestone:
@@ -44,3 +44,6 @@ disable-supported: true
 
 # The following PRR answers are required at beta release
 metrics:
+- kubelet_pod_start_total_duration_seconds
+- kubelet_pod_status_sync_duration_seconds
+- kubelet_pod_worker_duration_seconds