Skip to content

Commit 45ec205

Browse files
committed
KEP-1287: More details on limit resize failures
1 parent 44c6fbd commit 45ec205

File tree

1 file changed

+17
-13
lines changed
  • keps/sig-node/1287-in-place-update-pod-resources

1 file changed

+17
-13
lines changed

keps/sig-node/1287-in-place-update-pod-resources/README.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -591,25 +591,29 @@ When in-place resize is requested for multiple Containers in a Pod, Kubelet
591591
updates resource limit for the Pod and its Containers in the following manner:
592592
1. If resource resizing results in net-increase of a resource type (CPU or
593593
Memory), Kubelet first updates Pod-level cgroup limit for the resource
594-
type, and then updates the Container resource limit.
595-
1. If resource resizing results in net-decrease of a resource type, Kubelet
596-
first updates the Container resource limit, and then updates Pod-level
597-
cgroup limit.
598-
1. If resource update results in no net change of a resource type, only the
599-
Container resource limits are updated.
594+
type.
595+
1. All container limit decreases are applied.
596+
1. If all container limit decreases succeeded and resource resizing results in net-decrease of a
597+
resource type, Kubelet then updates the Pod-level cgroup limit.
598+
1. If all previous steps succeeded, container limit increases are applied.
600599

601600
In all the above cases, Kubelet applies Container resource limit decreases
602601
before applying limit increases.
603602

604603
#### Container resource limit update failure handling
605604

606-
If multiple Containers in a Pod are being updated, and UpdateContainerResources
607-
CRI API fails for any of the containers, Kubelet will backoff and retry at a
608-
later time. Kubelet does not attempt to update limits for containers that are
609-
lined up for update after the failing container. This ensures that sum of the
610-
container limits does not exceed Pod-level cgroup limit at any point. Once all
611-
the container limits have been successfully updated, Kubelet updates the Pod's
612-
Status.ContainerStatuses[i].Resources to match the desired limit values.
605+
If an `UpdateContainerResources` request fails while container limit decreases are being applied,
606+
the remainder of the container limit decreases will be attempted, but container limit increases or
607+
pod limit decreases will not. This ensures that sum of the container limits does not exceed
608+
Pod-level cgroup limit at any point.
609+
610+
If an `UpdateContainerResources` request fails while container limit increases are being applied,
611+
the remaining container limit increases will still be attempted.
612+
613+
If any errors are raised during the resize process:
614+
- An event will be emitted with the error details
615+
- The ResizeStatus will be set to `Error`
616+
- The pod will be requeued for sync, and the resize will be retried on the next pod sync.
613617

614618
#### CRI Changes Flow
615619

0 commit comments

Comments
 (0)