Skip to content

Commit dfb0ff1

Browse files
committed
Emphesize that KEP-2400 is about basic swap enablement
Signed-off-by: Itamar Holder <[email protected]>
1 parent 8e392b3 commit dfb0ff1

File tree

1 file changed

+39
-5
lines changed

1 file changed

+39
-5
lines changed

keps/sig-node/2400-node-swap/README.md

Lines changed: 39 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# KEP-2400: Node system swap support
1+
# KEP-2400: Node memory swap support
22

33
<!-- toc -->
44
- [Release Signoff Checklist](#release-signoff-checklist)
@@ -117,6 +117,31 @@ support to nodes in a controlled, predictable manner so that Kubernetes users
117117
can perform testing and provide data to continue building cluster capabilities
118118
on top of swap.
119119

120+
This KEP aims to
121+
introduce basic swap enablement and leave further extensions to follow-up KEPs.
122+
This way Kubernetes users / vendors would be able to use swap in a basic manner
123+
quickly while extensions would be brought to discussion in dedicated KEPs that
124+
would progress in the meantime.
125+
126+
For example, to achieve this goal, this KEP does not introduce any APIs
127+
that allow customizing how the feature behaves, but instead only determines
128+
whether the feature is enabled or disabled.
129+
From an API perspective, this is being done by presenting the kubelet `swapBehavior`
130+
configuration field.
131+
Within the scope of this KEP we will support only two basic behaviors: `NoSwap` and `LimitedSwap`.
132+
Both do not provide any customizability, as `NoSwap` disables swap for workloads and
133+
`LimitedSwap`'s behaviour is automatic and implicit that requires minimum user
134+
intervention (see [proposal below](#steps-to-calculate-swap-limit) for more details).
135+
As mentioned above, in the very near future, follow-up KEPs would bring API extension
136+
and customizability, supporting zswap, and many other extensions to discussion.
137+
These customization capabilities will probably be introduced as additional
138+
"swap behaviors" which will probably bring some API changes, perhaps at the pod level,
139+
that will extend this feature and make it more suitable for advanced usage.
140+
141+
While this KEP sets the ground for extending the API in follow-ups through "swap behaviors",
142+
changing APIs, especially at the pod-level, is highly complex and controversial.
143+
Therefore, it is out of scope for this KEP.
144+
120145
## Motivation
121146

122147
There are two distinct types of user for swap, who may overlap:
@@ -161,9 +186,11 @@ will be necessary to implement the third scenario.
161186
- Setting [swappiness]. This can already be set on a system-wide level outside
162187
of Kubernetes.
163188
- Allocating swap on a per-workload basis with accounting (e.g. pod-level
164-
specification of swap). If desired, this should be designed and implemented
165-
as part of a follow-up KEP. This KEP is a prerequisite for that work. Hence,
166-
swap will be an overcommitted resource in the context of this KEP.
189+
specification of swap), and/or APIs to customize and control the way kubelet
190+
calculates swap limits, grants swap access, etc. If desired, this should be
191+
designed and implemented as part of a follow-up KEP. This KEP is a
192+
prerequisite for that work. Hence, swap will be an overcommitted resource
193+
in the context of this KEP.
167194
- Supporting zram, zswap, or other memory types like SGX EPC. These could be
168195
addressed in a follow-up KEP, and are out of scope.
169196
- Use of swap for cgroupsv1.
@@ -194,7 +221,10 @@ Allocate the swap limit equal to the requested memory for each container and adj
194221

195222
#### Set Aside Swap for System Critical Daemons
196223

197-
**Note** In Beta2, we found that having system critical daemons swapping memory could cause degration of services.
224+
**Note** In Beta2, we found that having system-critical daemons swapping memory could cause degradation of services.
225+
Therefore, Kubelet will not automatically configure this, although the admin can still manually configure it
226+
this way. In the near future, when a follow-up KEP regarding customizability is presented, this will be considered
227+
to automatically be configured under a dedicated configuration.
198228

199229
System critical daemons (such as Kubelet) are essential for node health. Usually, an appropriate portion of system resources (e.g., memory, CPU) is reserved as system reserved. However, swap doesn't inherently support reserving a portion out of the total available. For instance, in the case of memory, we set `memory.min` on the node-level cgroup to ensure an adequate amount of memory is set aside, away from the pods, and for system critical daemons. But there is no equivalent for swap; i.e., no `memory.swap.min` is supported in the kernel.
200230

@@ -290,6 +320,10 @@ nodes could improve better resource pressure handling and recovery.
290320

291321
This user story is addressed by scenario 1 and 2, and could benefit from 3.
292322

323+
Note: critical / high-priority pods would not be able to access swap, but can
324+
still be configured otherwise to gain swap access. In the future, APIs / more
325+
swap behaviors would be able to be used to control swap in a more customized way.
326+
293327
#### Long-running applications that swap out startup memory
294328

295329
- Applications such as the Java and Node runtimes rely on swap for optimal

0 commit comments

Comments
 (0)