KEP-1287: Don't allow in-place memory limit decreases

tallclair · tallclair · commit 2f5c139ee11d · 2025-02-07T17:30:07.000-08:00
diff --git a/keps/sig-node/1287-in-place-update-pod-resources/README.md b/keps/sig-node/1287-in-place-update-pod-resources/README.md
@@ -773,22 +773,15 @@ pod status resources, but not otherwise acted upon.
 
 ### Memory Limit Decreases
 
-Setting the memory limit below current memory usage can cause problems. With cgroups v1 the change
-will simply be rejected by the kernel, whereas with cgroups v2 it will trigger an oom-kill.
-
-To avoid this situation, when downsizing container memory limits the Kubelet will first check the
-usage via the CRI `ContainerStats` (or maybe `ListContainerStats`) call. This check will be
-performed both at resource allocation time, and right before actuating the resize. If the
-allocation-time check fails, the resize will be deferred. If the actuation-time check fails, the
-resize will be skipped until the next pod sync, an event will report the error, and the resize
-status will be set to `Error`. Even with these protections, there is still the possibility of a
-time-of-check-time-of-use race, so the possibility of oom-kill will be documented, and caution
-recommended.
-
-If a memory limit decrease fails at actuation time, other resources and containers will continue to
-be resized, but the pod-level memory limit will not be decreased until all container limits have
-been successfully adjusted. For guaranteed pods, in the case the limit decrease fails, the memory
-request will be set to the original limit in the pod status.
+Setting the memory limit below current memory usage can cause problems. If the kernel cannot reclaim
+sufficient memory, the outcome depends on the cgroups version. With cgroups v1 the change will
+simply be rejected by the kernel, whereas with cgroups v2 it will trigger an oom-kill.
+
+In the initial beta release of in-place resize, we will **disallow** `PreferNoRestart` memory limit
+decreases, enforced through API validation. The intent is for this restriction to be relaxed in the
+future, but the design of how limit decreases will be approached is still undecided.
+
+Memory limit decreases with `RestartRequired` are still allowed.
 
 ### Sidecars
 
@@ -857,6 +850,7 @@ This will be reconsidered post-beta as a future enhancement.
 
 ### Future Enhancements
 
+1. Allow memory limits to be decreased, and handle the case where limits are set below usage.
 1. Kubelet (or Scheduler) evicts lower priority Pods from Node to make room for
    resize. Pre-emption by Kubelet may be simpler and offer lower latencies.
 1. Allow ResizePolicy to be set on Pod level, acting as default if (some of)