Skip to content

Commit 3577d17

Browse files
ike-maehashman
andcommitted
Implementation details from Ike Ma
Co-Authored-By: Elana Hashman <[email protected]>
1 parent a1941aa commit 3577d17

File tree

2 files changed

+103
-7
lines changed

2 files changed

+103
-7
lines changed

keps/sig-node/2400-node-swap/README.md

Lines changed: 102 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -220,13 +220,108 @@ Since swap provisioning is out of scope of this proposal, this enhancement poses
220220

221221
## Design Details
222222

223-
\[In progress\]
224-
225-
Need to add specifics here for:
226-
227-
- Changes to `--fail-on-swap` flag
228-
- CRI config details
229-
- Where changes will need to be made so that dockershim and the CRI are consistent with swap control
223+
### TL;DR
224+
225+
In a nutshell, the following implementation are planned for Memory Swap Support
226+
in 1.22 GKE alpha
227+
228+
1. Having a feature gate `SupportNodeMemorySwap` guarding against the memory
229+
swap support feature
230+
2. Keep the default value of kubelet flag `--fail-on-swap` to `true` in order
231+
to minimize the blast radius
232+
3. Introducing two new kubelet config `MemorySwapLimit` and `Swappiness`
233+
4. Introducing two new CRI parameter `memory_swap_limit_in_bytes` and `memory_swappiness`
234+
5. End to end wiring from kubelet config file to CRI
235+
236+
### Expected User Behaviour
237+
238+
For alpha, the feature gate `SupportNodeMemorySwap` is default to disabled, and
239+
`--fail-on-swap` flag value is the same as 1.21. Therefore, from Kubernetes
240+
user’s perspective, no behavior changes out of the box.
241+
242+
For users that are ready to explore the Memory Swap feature in 1.22 Alpha, they
243+
will need to complete the following steps
244+
245+
1. provision swap enable `SupportNodeMemorySwap` flag AND
246+
2. set `--fail-on-swap` flag to `false`
247+
248+
Then, the user can start experimenting/fine tuning kubelet configuration
249+
`MemorySwapLimit` and/or `Swappiness` and observe the changes.
250+
251+
### New Kubelet Configuration
252+
253+
We will be introducing two new parameters to `KubeletConfiguration struct`
254+
defined in
255+
[https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/types.go](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/types.go).
256+
These two configurations, if set, will apply to every container of the Node
257+
where kubelet is running.
258+
259+
|Name|Description|Default Value|Feature Gate|
260+
|--- |--- |--- |--- |
261+
|MemorySwapLimit|This parameter sets total memory limit (memory + swap). This limits the total amount of memory this container is allowed to swap to disk.|-2, which enable disable swap|SupportNodeMemorySwap|
262+
|MemorySwappiness|This configuration sets how aggressively the kernel will swap memory pages. By default, the host kernel can swap out a percentage of anonymous pages used by a container. Users can set value between 0 and 100, to tune this percentage.|Unset, which will use host value|SupportNodeMemorySwap|
263+
264+
#### MemorySwapLimit details
265+
266+
MemorySwapLimit configuration is a kubelet flag that only takes effect on a
267+
container that has a memory limit set, either explicitly from
268+
[PodSpec]([https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits)
269+
) or implicitly from [Resource
270+
Quota]([https://kubernetes.io/docs/concepts/policy/resource-quotas/](https://kubernetes.io/docs/concepts/policy/resource-quotas/)
271+
).
272+
273+
For container with memory limit set, MemorySwapLimit setting will have the
274+
following effects, [similar to
275+
docker](https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details)
276+
277+
* If MemorySwapLimit is set to a positive integer,
278+
* If the memory limit of the container is greater or equal to
279+
MemorySwapLimit, then no swap is allowed, the container does not have
280+
access to swap.
281+
* If the memory limit of the container is less than MemorySwapLimit, then
282+
MemorySwapLimit represents the total amount of memory and swap that can be
283+
used. For example, for a container with memory limit set to 300m, and
284+
`MemorySwapLimit` set to 1g, the container can use 300m of memory and 700m (1g
285+
- 300m) swap.
286+
* If MemorySwapLimit is set to 0, for containers with memory limit is set, the
287+
container can use as much swap as the Memory limit setting, if the host
288+
container has swap memory configured. For instance, if a container requests
289+
memory="300m" and MemorySwapLimit is not set, the container can use 600m in
290+
total of memory and swap.
291+
* If MemorySwapLimit is explicitly set to -1, the container is allowed to use
292+
unlimited swap, up to the amount available on the host system.
293+
* If MemorySwapLimit is explicitly set to -2, the container does not have
294+
access to swap. This value effectively prevents a container from using swap.
295+
296+
In summary, for users experimenting with this feature
297+
298+
|MemorySwapLimit|container memory limit (explicit or implicit)|Expected Behavior|Comment|
299+
|--- |--- |--- |--- |
300+
|Any|not set|N/A|Same as docker|
301+
|-2|N|no swap allowed, this is the default value||
302+
|-1|N|unlimited swap|Same as docker|
303+
|0|N|container can use up to N swap (ie: 2N memory+swap)|Same as docker|
304+
|X where X > 0|N where N < X|container can use up to X-N swap (ie: 2N memory+swap)|Same as docker|
305+
|X where X > 0|N where N >= X|no swap allowed (ie: N memory only)|Same as docker|
306+
307+
#### MemorySwappiness details
308+
309+
* A value of 0 turns off anonymous page swapping.
310+
* A value of 100 sets all anonymous pages as swappable.
311+
* By default, if you do not set MemorySwappiness, the value is inherited from
312+
the host machine.
313+
314+
### CRI Changes
315+
316+
We will be introducing the following two parameters
317+
`memory_swap_limit_in_bytes` and `memory_swappiness` to `message
318+
LinuxContainerResources` defined in
319+
[https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto#L563-L580](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto#L563-L580)
320+
321+
|Name|Type|Description|Default Value|Feature Gate|
322+
|--- |--- |--- |--- |--- |
323+
|`memory_swap_limit_in_bytes`|int64|set/show limit of memory+swap usage|Default 0, which is unspecified.|SupportNodeMemorySwap|
324+
|`memory_swappiness`|int64|set/show swappiness parameter|Default 0, which is unspecified.|SupportNodeMemorySwap|
230325

231326
### Test Plan
232327

keps/sig-node/2400-node-swap/kep.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ title: Node system swap support
22
kep-number: 2400
33
authors:
44
- "@ehashman"
5+
- "@ike-ma"
56
owning-sig: sig-node
67
participating-sigs:
78
- sig-node

0 commit comments

Comments
 (0)