Skip to content

Commit 3bcbdde

Browse files
committed
KEP-1287: Replace Resize status with conditions
1 parent 00af4e0 commit 3bcbdde

File tree

1 file changed

+48
-51
lines changed
  • keps/sig-node/1287-in-place-update-pod-resources

1 file changed

+48
-51
lines changed

keps/sig-node/1287-in-place-update-pod-resources/README.md

Lines changed: 48 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -295,36 +295,33 @@ The `ResizePolicy` field is immutable.
295295

296296
#### Resize Status
297297

298-
In addition to the above, a new field `Pod.Status.Resize[]`
299-
will be added. This field indicates whether kubelet has accepted or rejected a
300-
proposed resize operation for a given resource. Any time the
301-
`Pod.Spec.Containers[i].Resources.Requests` field differs from the
302-
`Pod.Status.ContainerStatuses[i].Resources` field, this new field explains why.
303-
304-
This field can be set to one of the following values:
305-
* `InProgress` - the proposed resize has been accepted and is being actuated. A `Deferred` or
306-
`Infeasible` resize will take precedence over `InProgress`.
307-
Desired resources == Allocated resources != Actual resources.
298+
Resize status will be tracked via 2 new pod conditions: `PodResizePending` and `PodResizing`.
299+
300+
**PodResizePending** will track states where the spec has been resized, but the Kubelet has not yet
301+
allocated the resources. There are two reasons associated with this condition:
302+
308303
* `Deferred` - the proposed resize is feasible in theory (it fits on this node)
309-
but is not possible right now; it will be re-evaluated on every pod sync.
310-
Desired resources != Allocated resources.
311-
* `Infeasible` - the proposed resize is not feasible and is rejected; it will not
312-
be re-evaluated. Desired resources != Allocated resources.
313-
* (no value) - there is no proposed resize.
314-
Desired resources == Allocated resources == Acutal resources.
315-
* `Error` - if an error occurs while actuating the resize (see [Memory Limit Decreases](#memory-limit-decreases)
316-
for an example), then the resize status will be set to `Error` and an event will report the
317-
details. The error state behaves similarly to `InProgress`, and the allocated resize will be
318-
retried on the next pod sync.
319-
320-
To make this field future-safe, consumers should assume that any unknown value
321-
means the same as `Deferred`.
322-
323-
Prior to v1.33, the apiserver would populate an additional `Proposed` state to identify a new resize
324-
that has not yet been acknowledged by the Kubelet. This state will be deprecated in v1.33 and no
325-
longer populated (due to a race to set it between the apiserver & kubelet). Instead, the new
326-
[`ObservedGeneration`](https://github.com/kubernetes/enhancements/pull/5068) feature can be used to
327-
tell whether the resize status includes the latest resize request.
304+
but is not possible right now; it will be regularly reevaluated.
305+
* `Infeasible` - the proposed resize is not feasible and is rejected; it may not
306+
be re-evaluated.
307+
308+
In either case, the condition's `message` will include details of why the resize has not been
309+
admitted. `lastTransitionTime` will be populated with the time the condition was added. `status`
310+
will always be `True` when the condition is present - if there is no longer a pending resized
311+
(either the resize was allocated or reverted), the condition will be removed.
312+
313+
**PodResizing** will track in-progress resizes, and should be present whenever allocated resources
314+
!= acknowledged resources (see [Resource States](#resource-states)). For successful synchronous
315+
resizes, this condition should be short lived, and `reason` and `message` will be left blank. If an
316+
error occurs while actuating the resize, the `reason` will be set to `Error`, and `message` will be
317+
populated with the error message. In the future, this condition will also be used for long-running
318+
resizing behaviors (see [Memory Limit Decreases](#memory-limit-decreases)).
319+
320+
Note that it is possible for both conditions to be present at the same time, for example if an error
321+
is encountered while actuating a resize and a new resize comes in that gets deferred.
322+
323+
Prior to v1.33, the resize status was tracked by a dedicated `Pod.Status.Resize` field. This field
324+
will be deprecated, and not graduate to beta.
328325

329326
#### CRI Changes
330327

@@ -458,10 +455,10 @@ Spec.Containers[i].Resources.Requests) to the sum.
458455
Container resource limits. Once all Containers are successfully updated, it
459456
updates Status...Resources to reflect new resource values and unsets
460457
Status.Resize.
461-
* If new desired resources don't fit, Kubelet will update the Status.Resize
462-
field to "Infeasible" and does not act on the resize.
463-
* If new desired resources fit but are in-use at the moment, Kubelet will
464-
update the Status.Resize field to "Deferred".
458+
* If new desired resources don't fit, Kubelet will add the `PodResizePending` condition with type
459+
`Infeasible` and a message explaining why.
460+
* If new desired resources fit but are in-use at the moment, Kubelet will add the `PodResizePending`
461+
condition with type `Deferred` and a message explaining why.
465462

466463
In addition to the above, kubelet will generate Events on the Pod whenever a
467464
resize is accepted or rejected, and if possible at key steps during the resize
@@ -513,7 +510,6 @@ This is intentionally hitting various edge-cases for demonstration.
513510

514511
1. kubelet runs the pod and updates the API
515512
- `spec.containers[0].resources.requests[cpu]` = 1
516-
- `status.resize` = unset
517513
- `status.containerStatuses[0].allocatedResources[cpu]` = 1
518514
- `acknowledged[cpu]` = 1
519515
- `status.containerStatuses[0].resources.requests[cpu]` = 1
@@ -523,7 +519,6 @@ This is intentionally hitting various edge-cases for demonstration.
523519
- apiserver validates the request (e.g. `limits` are not below
524520
`requests`, ResourceQuota not exceeded, etc) and accepts the operation
525521
- `spec.containers[0].resources.requests[cpu]` = 1.5
526-
- `status.resize` = unset
527522
- `status.containerStatuses[0].allocatedResources[cpu]` = 1
528523
- `acknowledged[cpu]` = 1
529524
- `status.containerStatuses[0].resources.requests[cpu]` = 1
@@ -533,82 +528,82 @@ This is intentionally hitting various edge-cases for demonstration.
533528
- The allocated & acknowledged resources are read back from checkpoint
534529
- Pods are resynced from the API server, but admitted based on the allocated resources
535530
- `spec.containers[0].resources.requests[cpu]` = 1.5
536-
- `status.resize` = unset
537531
- `status.containerStatuses[0].allocatedResources[cpu]` = 1
538532
- `acknowledged[cpu]` = 1
539533
- `status.containerStatuses[0].resources.requests[cpu]` = 1
540534
- actual CPU shares = 1024
541535

542536
1. Kubelet syncs the pod, sees resize #1 and admits it
543537
- `spec.containers[0].resources.requests[cpu]` = 1.5
544-
- `status.resize` = `"InProgress"`
545538
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
546539
- `acknowledged[cpu]` = 1
547540
- `status.containerStatuses[0].resources.requests[cpu]` = 1
541+
- `status.conditions[type==PodResizing]` added
548542
- actual CPU shares = 1024
549543

550544
1. Resize #2: cpu = 2
551545
- apiserver validates the request and accepts the operation
552546
- `spec.containers[0].resources.requests[cpu]` = 2
553-
- `status.resize` = `"InProgress"`
554547
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
555548
- `status.containerStatuses[0].resources.requests[cpu]` = 1
549+
- `status.conditions[type==PodResizing]`
556550
- actual CPU shares = 1024
557551

558552
1. Container runtime applied cpu=1.5
559553
- `spec.containers[0].resources.requests[cpu]` = 2
560-
- `status.resize` = `"InProgress"`
561554
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
562555
- `acknowledged[cpu]` = 1.5
563556
- `status.containerStatuses[0].resources.requests[cpu]` = 1
557+
- `status.conditions[type==PodResizing]`
564558
- actual CPU shares = 1536
565559

566560
1. kubelet syncs the pod, and sees resize #2 (cpu = 2)
567561
- kubelet decides this is feasible, but currently insufficient available resources
568562
- `spec.containers[0].resources.requests[cpu]` = 2
569-
- `status.resize[cpu]` = `"Deferred"`
570563
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
571564
- `acknowledged[cpu]` = 1.5
572565
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
566+
- `status.conditions[type==PodResizePending].type` = `"Deferred"`
567+
- `status.conditions[type==PodResizing]` removed
573568
- actual CPU shares = 1536
574569

575570
1. Resize #3: cpu = 1.6
576571
- apiserver validates the request and accepts the operation
577572
- `spec.containers[0].resources.requests[cpu]` = 1.6
578-
- `status.resize[cpu]` = `"Deferred"`
579573
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
580574
- `acknowledged[cpu]` = 1.5
581575
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
576+
- `status.conditions[type==PodResizePending].type` = `"Deferred"`
582577
- actual CPU shares = 1536
583578

584579
1. Kubelet syncs the pod, and sees resize #3 and admits it
585580
- `spec.containers[0].resources.requests[cpu]` = 1.6
586-
- `status.resize[cpu]` = `"InProgress"`
587581
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
588582
- `acknowledged[cpu]` = 1.5
589583
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
584+
- `status.conditions[type==PodResizePending]` removed
585+
- `status.conditions[type==PodResizing]` added
590586
- actual CPU shares = 1536
591587

592588
1. Container runtime applied cpu=1.6
593589
- `spec.containers[0].resources.requests[cpu]` = 1.6
594-
- `status.resize[cpu]` = `"InProgress"`
595590
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
596591
- `acknowledged[cpu]` = 1.6
597592
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
593+
- `status.conditions[type==PodResizing]`
598594
- actual CPU shares = 1638
599595

600596
1. Kubelet syncs the pod
601597
- `spec.containers[0].resources.requests[cpu]` = 1.6
602-
- `status.resize[cpu]` = unset
603598
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
604599
- `acknowledged[cpu]` = 1.6
605600
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
601+
- `status.conditions[type==PodResizing]` removed
606602
- actual CPU shares = 1638
607603

608604
1. Resize #4: cpu = 100
609605
- apiserver validates the request and accepts the operation
610606
- `spec.containers[0].resources.requests[cpu]` = 100
611-
- `status.resize[cpu]` = unset
612607
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
613608
- `acknowledged[cpu]` = 1.6
614609
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
@@ -617,10 +612,10 @@ This is intentionally hitting various edge-cases for demonstration.
617612
1. Kubelet syncs the pod, and sees resize #4
618613
- this node does not have 100 CPUs, so kubelet cannot admit it
619614
- `spec.containers[0].resources.requests[cpu]` = 100
620-
- `status.resize[cpu]` = `"Infeasible"`
621615
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
622616
- `acknowledged[cpu]` = 1.6
623617
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
618+
- `status.conditions[type==PodResizePending].type` = `"Infeasible"`
624619
- actual CPU shares = 1638
625620

626621
#### Container resource limit update ordering
@@ -723,18 +718,20 @@ Impacts of a restart outside of resource configuration are out of scope.
723718
- On restart, Kubelet reads the latest pod from the API and triggers a pod sync, so same effect as
724719
observing the update.
725720
1. Updated pod is synced: Check if pod can be admitted
726-
- No: resize status is deferred, no change to allocated resources
721+
- No: add `PodResizePending` condition with type `Deferred`, no change to allocated resources
727722
- Restart: redo admission check, still deferred.
728-
- Yes: resize status is in-progress, update allocated checkpoint
723+
- Yes: add `PodResizing` condition, update allocated checkpoint
729724
- Restart before update: readmit, then update allocated
730725
- Restart after update: allocated != acknowledged --> proceed with resize
731726
1. Allocated != Acknowledged
732727
- Trigger an `UpdateContainerResources` CRI call, then update Acknowledged resources on success
733728
- Restart before CRI call: allocated != acknowledged, will still trigger the update call
734729
- Restart after CRI call, before acknowledged update: will redo update call
735-
- Restart after acknowledged update: allocated == acknowledged, resize status cleared
730+
- Restart after acknowledged update: allocated == acknowledged, condition removed
731+
- In all restart cases, `LastTransitionTime` is propagated from the old pod status `PodResizing`
732+
condition, and remains unchanged.
736733
1. PLEG updates PodStatus cache, triggers pod sync
737-
- Pod status updated with actual resources, resize status cleared
734+
- Pod status updated with actual resources, `PodResizing` condition removed
738735
- Desired == Allocated == Acknowledged, no resize changes needed.
739736

740737
#### Notes
@@ -1532,7 +1529,7 @@ _This section must be completed when targeting beta graduation to a release._
15321529
- Add ResourceQuota details
15331530
- Heuristic version skew handling in API validation
15341531
- 2025-01-24 - v1.33 updates for planned beta
1535-
- Remove `Proposed` resize status
1532+
- Replace ResizeStatus with conditions
15361533
- Improve memory limit downsize handling
15371534
- Rename ResizeRestartPolicy `NotRequired` to `PreferNoRestart`,
15381535
and update CRI `UpdateContainerResources` contract

0 commit comments

Comments
 (0)