Skip to content

Commit 1cb4e89

Browse files
committed
KEP-1287: Add back container status allocatedResources
1 parent 4ccfee5 commit 1cb4e89

File tree

1 file changed

+24
-54
lines changed
  • keps/sig-node/1287-in-place-update-pod-resources

1 file changed

+24
-54
lines changed

keps/sig-node/1287-in-place-update-pod-resources/README.md

Lines changed: 24 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,12 @@
2828
- [Notes](#notes)
2929
- [Lifecycle Nuances](#lifecycle-nuances)
3030
- [Atomic Resizes](#atomic-resizes)
31+
- [Edge-triggered Resizes](#edge-triggered-resizes)
32+
- [Memory Limit Decreases](#memory-limit-decreases)
3133
- [Sidecars](#sidecars)
3234
- [QOS Class](#qos-class)
3335
- [Resource Quota](#resource-quota)
3436
- [Affected Components](#affected-components)
35-
- [Instrumentation](#instrumentation)
3637
- [Static CPU & Memory Policy](#static-cpu--memory-policy)
3738
- [Future Enhancements](#future-enhancements)
3839
- [Mutable QOS Class "Shape"](#mutable-qos-class-shape)
@@ -64,7 +65,7 @@
6465
- [Implementation History](#implementation-history)
6566
- [Drawbacks](#drawbacks)
6667
- [Alternatives](#alternatives)
67-
- [Allocated Resources](#allocated-resources-1)
68+
- [Allocated Resource Limits](#allocated-resource-limits)
6869
<!-- /toc -->
6970

7071
## Release Signoff Checklist
@@ -216,8 +217,7 @@ PodStatus is extended to show the resources applied to the Pod and its Container
216217
* Pod.Status.ContainerStatuses[i].Resources (new field, type
217218
v1.ResourceRequirements) shows the **actual** resources held by the Pod and
218219
its Containers for running containers, and the allocated resources for non-running containers.
219-
* Pod.Status.AllocatedResources (new field) reports the aggregate pod-level allocated resources,
220-
computed from the container-level allocated resources.
220+
* Pod.Status.ContainerStatuses[i].AllocatedResources (new field) reports the allocated resource requests.
221221
* Pod.Status.Resize (new field, type map[string]string) explains what is
222222
happening for a given resource on a given container.
223223

@@ -234,43 +234,13 @@ Additionally, a new `Pod.Spec.Containers[i].ResizePolicy[]` field (type
234234

235235
When the Kubelet admits a pod initially or admits a resize, all resource requirements from the spec
236236
are cached and checkpointed locally. When a container is (re)started, these are the requests and
237-
limits used. The allocated resources are only reported in the API at the pod-level, through the
238-
`Pod.Status.AllocatedResources` field.
237+
limits used. Only the allocated requests are reported in the API, through the
238+
`Pod.Status.ContainerStatuses[i].AllocatedResources` field.
239239

240-
```
241-
type PodStatus struct {
242-
// ...
243-
244-
// AllocatedResources is the pod-level allocated resources. Only allocated requests are included.
245-
// +optional
246-
AllocatedResources *PodAllocatedResources `json:"allocatedResources,omitempty"`
247-
}
248-
249-
// PodAllocatedResources is used for reporting pod-level allocated resources.
250-
type PodAllocatedResources struct {
251-
// Requests is the pod-level allocated resource requests, either directly
252-
// from the pod-level resource requirements if specified, or computed from
253-
// the total container allocated requests.
254-
// +optional
255-
Requests v1.ResourceList
256-
}
257-
258-
```
259-
260-
The alpha implementation of In-Place Pod Vertical Scaling included `AllocatedResources` in the
261-
container status, but only included requests. This field will remain in alpha, guarded by the
262-
separate `InPlacePodVerticalScalingAllocatedStatus` feature gate, and is a candidate for future
263-
removal. With the allocated status feature gate enabled, Kubelet will continue to populate the field
264-
with the allocated requests from the checkpoint.
265-
266-
The scheduler uses `max(spec...resources, status.allocatedResources, status...resources)` for fit
240+
The scheduler uses `max(spec...resources, status...allocatedResources, status...resources)` for fit
267241
decisions, but since the actual resources are only relevant and reported for running containers, the
268242
Kubelet sets `status...resources` equal to the allocated resources for non-running containers.
269243

270-
See [`Alternatives: Allocated Resources`](#allocated-resources-1) for alternative APIs considered.
271-
272-
The allocated resources API should be reevaluated prior to GA.
273-
274244
#### Subresource
275245

276246
Resource changes can only be made via the new `/resize` subresource, which accepts Update and Patch
@@ -492,7 +462,7 @@ To compute the Node resources allocated to Pods, pending resizes must be factore
492462
The scheduler will use the maximum of:
493463
1. Desired resources, computed from container requests in the pod spec, unless the resize is marked as `Infeasible`
494464
1. Actual resources, computed from the `.status.containerStatuses[i].resources.requests`
495-
1. Allocated resources, reported in `.status.allocatedResources.requests`
465+
1. Allocated resources, reported in `.status.containerStatuses[i].allocatedResources`
496466

497467
### Flow Control
498468

@@ -512,7 +482,7 @@ This is intentionally hitting various edge-cases for demonstration.
512482
1. kubelet runs the pod and updates the API
513483
- `spec.containers[0].resources.requests[cpu]` = 1
514484
- `status.resize` = unset
515-
- `status.allocatedResources.requests[cpu]` = 1
485+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1
516486
- `status.containerStatuses[0].resources.requests[cpu]` = 1
517487
- actual CPU shares = 1024
518488

@@ -536,67 +506,67 @@ This is intentionally hitting various edge-cases for demonstration.
536506
- apiserver validates the request and accepts the operation
537507
- `spec.containers[0].resources.requests[cpu]` = 2
538508
- `status.resize` = `"InProgress"`
539-
- `status.allocatedResources.requests[cpu]` = 1.5
509+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
540510
- `status.containerStatuses[0].resources.requests[cpu]` = 1
541511
- actual CPU shares = 1024
542512

543513
1. Container runtime applied cpu=1.5
544514
- `spec.containers[0].resources.requests[cpu]` = 2
545515
- `status.resize` = `"InProgress"`
546-
- `status.allocatedResources.requests[cpu]` = 1.5
516+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
547517
- `status.containerStatuses[0].resources.requests[cpu]` = 1
548518
- actual CPU shares = 1536
549519

550520
1. kubelet syncs the pod, and sees resize #2 (cpu = 2)
551521
- kubelet decides this is feasible, but currently insufficient available resources
552522
- `spec.containers[0].resources.requests[cpu]` = 2
553523
- `status.resize[cpu]` = `"Deferred"`
554-
- `status.allocatedResources.requests[cpu]` = 1.5
524+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
555525
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
556526
- actual CPU shares = 1536
557527

558528
1. Resize #3: cpu = 1.6
559529
- apiserver validates the request and accepts the operation
560530
- `spec.containers[0].resources.requests[cpu]` = 1.6
561531
- `status.resize[cpu]` = `"Deferred"`
562-
- `status.allocatedResources.requests[cpu]` = 1.5
532+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
563533
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
564534
- actual CPU shares = 1536
565535

566536
1. Kubelet syncs the pod, and sees resize #3 and admits it
567537
- `spec.containers[0].resources.requests[cpu]` = 1.6
568538
- `status.resize[cpu]` = `"InProgress"`
569-
- `status.allocatedResources.requests[cpu]` = 1.6
539+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
570540
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
571541
- actual CPU shares = 1536
572542

573543
1. Container runtime applied cpu=1.6
574544
- `spec.containers[0].resources.requests[cpu]` = 1.6
575545
- `status.resize[cpu]` = `"InProgress"`
576-
- `status.allocatedResources.requests[cpu]` = 1.6
546+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
577547
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
578548
- actual CPU shares = 1638
579549

580550
1. Kubelet syncs the pod
581551
- `spec.containers[0].resources.requests[cpu]` = 1.6
582552
- `status.resize[cpu]` = unset
583-
- `status.allocatedResources.requests[cpu]` = 1.6
553+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
584554
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
585555
- actual CPU shares = 1638
586556

587557
1. Resize #4: cpu = 100
588558
- apiserver validates the request and accepts the operation
589559
- `spec.containers[0].resources.requests[cpu]` = 100
590560
- `status.resize[cpu]` = unset
591-
- `status.allocatedResources.requests[cpu]` = 1.6
561+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
592562
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
593563
- actual CPU shares = 1638
594564

595565
1. Kubelet syncs the pod, and sees resize #4
596566
- this node does not have 100 CPUs, so kubelet cannot admit it
597567
- `spec.containers[0].resources.requests[cpu]` = 100
598568
- `status.resize[cpu]` = `"Infeasible"`
599-
- `status.allocatedResources.requests[cpu]` = 1.6
569+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
600570
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
601571
- actual CPU shares = 1638
602572

@@ -789,7 +759,7 @@ With InPlacePodVerticalScaling enabled, resource quota needs to consider pending
789759
to how this is handled by scheduling, resource quota will use the maximum of:
790760
1. Desired resources, computed from container requests in the pod spec, unless the resize is marked as `Infeasible`
791761
1. Actual resources, computed from the `.status.containerStatuses[i].resources.requests`
792-
1. Allocated resources, reported in `.status.allocatedResources.requests`
762+
1. Allocated resources, reported in `.status.containerStatuses[i].allocatedResources`
793763

794764
To properly handle scale-down, resource quota controller now needs to evaluate
795765
pod updates where `.status...resources` changed.
@@ -1101,7 +1071,7 @@ Setup a guaranteed class Pod with two containers (c1 & c2).
11011071
#### Backward Compatibility and Negative Tests
11021072

11031073
1. Verify that Node is allowed to update only a Pod's AllocatedResources field.
1104-
1. Verify that only Node account is allowed to udate AllocatedResources field.
1074+
1. Verify that only Node account is allowed to update AllocatedResources field.
11051075
1. Verify that updating Pod Resources in workload template spec retains current
11061076
behavior:
11071077
- Updating Pod Resources in Job template is not allowed.
@@ -1329,7 +1299,7 @@ the health of the service?**
13291299

13301300
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
13311301

1332-
- Resize requests should succeed (`apiserver_request_total{resource=pods,subresource=resize}` with non-success `code` should be low))
1302+
- Resize requests should succeed (`apiserver_request_total{resource=pods,subresource=resize}` with non-success `code` should be low)
13331303
- Resource update operations should complete quickly (`runtime_operations_duration_seconds{operation_type=container_update} < X` for 99% of requests)
13341304
- Resource update error rate should be low (`runtime_operations_errors_total{operation_type=container_update}/runtime_operations_total{operation_type=container_update}`)
13351305

@@ -1472,7 +1442,7 @@ _This section must be completed when targeting beta graduation to a release._
14721442
- Improve memory limit downsize handling
14731443
- Rename ResizeRestartPolicy `NotRequired` to `PreferNoRestart`,
14741444
and update CRI `UpdateContainerResources` contract
1475-
- Add pod-level `AllocatedResources`
1445+
- Add back `AllocatedResources` field to resolve a scheduler corner case
14761446
- Switch to edge-triggered resize actuation
14771447

14781448
## Drawbacks
@@ -1494,9 +1464,9 @@ information to express the idea and why it was not acceptable.
14941464
We considered having scheduler approve the resize. We also considered PodSpec as
14951465
the location to checkpoint allocated resources.
14961466

1497-
### Allocated Resources
1467+
### Allocated Resource Limits
14981468

1499-
If we need allocated resources & limits in the pod status API, the following options have been
1469+
If we need allocated limits in the pod status API, the following options have been
15001470
considered:
15011471

15021472
**Option 1: New field "AcceptedResources"**

0 commit comments

Comments
 (0)