Skip to content

Commit 90b6b90

Browse files
committed
Updating milestone to Beta
1 parent c695e44 commit 90b6b90

File tree

3 files changed

+39
-6
lines changed

3 files changed

+39
-6
lines changed
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 5419
22
alpha:
3-
approver: "@soltysh"
3+
approver: "@soltysh"
4+
beta:
5+
approver: "@soltysh"

keps/sig-node/5419-pod-level-resources-in-place-resize/README.md

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
- [Integration tests](#integration-tests)
2626
- [e2e tests](#e2e-tests)
2727
- [Graduation Criteria](#graduation-criteria)
28-
- [Phase 1: Alpha (target 1.35)](#phase-1-alpha-target-135)
28+
- [Phase 1: Alpha (target 1.35) [DONE]](#phase-1-alpha-target-135-done)
2929
- [Phase 2: Beta (target 1.36)](#phase-2--beta-target-136)
3030
- [GA (stable)](#ga-stable)
3131
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
@@ -493,7 +493,7 @@ Following scenarios need to be covered:
493493
### Graduation Criteria
494494

495495

496-
#### Phase 1: Alpha (target 1.35)
496+
#### Phase 1: Alpha (target 1.35) [DONE]
497497
* Feature is disabled by default. It is an opt-in feature which can be enabled by
498498
enabling the InPlacePodLevelResourcesVerticalScaling feature gate and by setting
499499
the new resources fields in PodSpec at Pod level.
@@ -828,6 +828,7 @@ Focusing mostly on:
828828
- One new PATCH PodStatus API call in response to Pod resize request.
829829
- No additional overhead unless Pod resize is invoked.
830830
- estimated throughput
831+
- Proportional to the number of resize requests ssued by users or controllers (e.g., VPA). For a typical cluster this is expected to be < 1% of total Pod update traffic.
831832
- originating component(s) (e.g. Kubelet, Feature-X-controller)
832833
- Kubelet
833834
focusing mostly on:
@@ -866,7 +867,9 @@ Describe them, providing:
866867
-->
867868
Negligible.
868869
- API type(s):
869-
- Estimated increase in size: (e.g., new annotation of size 32B)
870+
- Estimated increase in size: (e.g., new annotation of size 32B): Each Pod object will grow by approximately
871+
200-400 bytes due to the addition of `Resources` and `AllocatedResources`
872+
fields in `PodStatus`, plus the `Resources` stanza in `PodSpec`.
870873
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
871874
- type PodStatus has 2 new fields of type v1.ResourceRequirements and v1.ResourceList
872875

@@ -924,6 +927,10 @@ details). For now, we leave it here.
924927

925928
###### How does this feature react if the API server and/or etcd is unavailable?
926929

930+
If the API server or etcd is unavailable, existing pods will continue to run with their last
931+
known resource configurations. No new resize requests can be initiated, and the Kubelet
932+
will be unable to update the `PodStatus` to reflect any locally completed or failed
933+
resizes until connectivity is restored.
927934

928935
###### What are other known failure modes?
929936

@@ -940,12 +947,35 @@ For each of them, fill in the following information by copying the below templat
940947
- Testing: Are there any tests for failure mode? If not, describe why.
941948
-->
942949

950+
- **CRI Runtime doesn't support Pod Sandbox Resize**:
951+
- Detection: `PodStatus.Resize` will be stuck in `InProgress` and Kubelet logs will
952+
show errors calling `UpdatePodSandboxResources`.
953+
- Mitigations: Disable the feature gate or upgrade the container runtime to a
954+
compatible version (e.g., latest containerd/CRI-O).
955+
- Diagnostics: Kubelet logs (search for `UpdatePodSandboxResources` errors) and
956+
`kubectl get pod <name> -o yaml` to check `resizeStatus`.
957+
- **Cgroup update failure (OS level)**:
958+
- Detection: Kubelet will emit an event indicating failure to update cgroups.
959+
- Mitigations: Revert the resize request in the Pod spec to a known-good value.
960+
- Diagnostics: Kubelet logs and `dmesg` on the node for potential OOM or cgroup
961+
permission issues.
962+
943963

944964
###### What steps should be taken if SLOs are not being met to determine the problem?
945965

966+
1. Verify if the `InPlacePodLevelResourcesVerticalScaling` feature gate is enabled
967+
on all components (apiserver, scheduler, kubelet).
968+
2. Check `apiserver_request_total{resource="pods", subresource="resize"}` to see
969+
if resize requests are being rejected at the API level.
970+
3. Inspect Kubelet logs for errors related to `UpdatePodSandboxResources` or
971+
`ResourceCalculation`.
972+
4. Monitor `node_collector_evictions_total` to ensure pod-level limits aren't
973+
causing unexpected evictions.
974+
946975
## Implementation History
947976

948977
- **2025-06-18:** KEP draft split from (KEP#2387)[https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md]
978+
- **2026-01-28:** KEP moved to beta for 1.36 release
949979

950980
## Drawbacks
951981

keps/sig-node/5419-pod-level-resources-in-place-resize/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,17 @@ see-also: [
2222
replaces: []
2323

2424
# The target maturity stage in the current dev cycle for this KEP.
25-
stage: alpha
25+
stage: beta
2626

2727
# The most recent milestone for which work toward delivery of this KEP has been
2828
# done. This can be the current (upcoming) milestone, if it is being actively
2929
# worked on.
30-
latest-milestone: "v1.35"
30+
latest-milestone: "v1.36"
3131

3232
# The milestone at which this feature was, or is targeted to be, at each stage.
3333
milestone:
3434
alpha: "v1.35"
35+
beta: "v1.36"
3536

3637
# The following PRR answers are required at alpha release
3738
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)