Skip to content

fix: use container status resources over desired spec resources for cpu/memory resource metrics during in-place pod vertical scaling#1433

Draft
kondracek-nr wants to merge 1 commit intomainfrom
kondracek/in-pod-verical-scaling
Draft

fix: use container status resources over desired spec resources for cpu/memory resource metrics during in-place pod vertical scaling#1433
kondracek-nr wants to merge 1 commit intomainfrom
kondracek/in-pod-verical-scaling

Conversation

@kondracek-nr
Copy link
Copy Markdown
Contributor

@kondracek-nr kondracek-nr commented Mar 9, 2026

Kubernetes 1.33 introduced in-place pod vertical scaling (beta, on by default; GA in 1.35). With this feature, Pod.Spec.Containers[i].Resources becomes the desired state rather than the actual state. The actual applied resources live in Pod.Status.ContainerStatuses[i].Resources, which is only populated once the kubelet has successfully enacted the allocation.

This means that during an active resize, cpuRequestedCores, memoryRequestedBytes, cpuLimitCores, and memoryLimitBytes — and the utilization ratios derived from them — were reporting the target values before they were applied to the running container.

Description

Change
In fetchContainersData, prefer Pod.Status.ContainerStatuses[i].Resources (actual applied state) over Pod.Spec.Containers[i].Resources (desired state), with a fallback to Spec when Status resources are nil. Sidecar init containers (RestartPolicy: Always) are handled the same way via Pod.Status.InitContainerStatuses.

Backward compatibility

  • Pre-1.33 clusters: ContainerStatus.Resources is never populated, so the nil fallback always fires. No behavior change.
  • 1.33+ clusters, no resize in progress: Spec and Status are identical. No behavior change.
  • 1.33+ clusters, resize in progress: Values differ. We now report the currently-enforced allocation rather than the pending target.

Why change existing metric semantics vs. adding new metrics
Upstream kube-state-metrics is likely taking an additive approach (new kube_pod_container_actual_resource_* metrics) because changing existing metric semantics would silently break dashboards across their entire user base with no path to coordinate consumers.

Our situation is different: we control both the metrics and the dashboards that consume them, and cpuRequestedCores semantically means "what is currently being enforced on the container." Using the desired state for utilization calculations (cpuUsageCores / cpuRequestedCores) produces a ratio that doesn't reflect the container's actual resource envelope during a resize. The value change is also narrow in scope — it only occurs during the window between a resize being requested and the kubelet applying it.

Desired-state metrics (cpuRequestedCoresDesired etc.) and resize condition metrics (PodResizePending, PodResizeInProgress) are left as follow-up work, as is parity with the OTel collector chart (blocked on upstream KSM shipping the new actual-resource metrics).

Related

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature / enhancement (non-breaking change which adds functionality)
  • Security fix
  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • Add changelog entry following the contributing guide
  • Documentation has been updated
  • This change requires changes in testing:
    • unit tests
    • E2E tests

@kondracek-nr kondracek-nr requested a review from a team as a code owner March 9, 2026 20:34
…pu/memory resource metrics during in-place pod vertical scaling
@kondracek-nr kondracek-nr force-pushed the kondracek/in-pod-verical-scaling branch from a8ad572 to 9b67ab3 Compare March 9, 2026 22:27
@kondracek-nr kondracek-nr marked this pull request as draft March 9, 2026 23:19
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.87%. Comparing base (45be912) to head (9b67ab3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1433      +/-   ##
==========================================
+ Coverage   74.74%   74.87%   +0.13%     
==========================================
  Files          53       53              
  Lines        3694     3706      +12     
==========================================
+ Hits         2761     2775      +14     
+ Misses        762      760       -2     
  Partials      171      171              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant