Skip to content

Commit d97e2bf

Browse files
committed
drop unlimitedswap from kep
1 parent 3f4b5e4 commit d97e2bf

File tree

1 file changed

+27
-13
lines changed

1 file changed

+27
-13
lines changed

keps/sig-node/2400-node-swap/README.md

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@
2727
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
2828
- [Risks and Mitigations](#risks-and-mitigations)
2929
- [Existing use cases of Swap](#existing-use-cases-of-swap)
30+
- [Exhausting swap resource](#exhausting-swap-resource)
3031
- [Security risk](#security-risk)
32+
- [Cgroupv1 support](#cgroupv1-support)
3133
- [Design Details](#design-details)
3234
- [Enabling swap as an end user](#enabling-swap-as-an-end-user)
3335
- [API Changes](#api-changes)
@@ -225,7 +227,7 @@ The main concern would be swapping in the critical services on the control plane
225227

226228
##### Use of a dedicated disk for swap
227229

228-
We recommend using a separate disk for your swap partition.
230+
We recommend using a separate disk for your swap partition.
229231

230232
### Steps to Calculate Swap Limit
231233

@@ -317,12 +319,16 @@ clusters).
317319

318320
This user story is addressed by scenarios 1 and 2, and could benefit from 3.
319321

322+
It turns out that we discovered usecases where someone areas set `--fail-swap-on=false`
323+
to allow for swap enabled nodes for local development.
324+
320325
#### Low footprint systems
321326

322327
For example, edge devices with limited memory.
323328

324329
- Edge compute systems/devices with small memory footprints (\<2Gi)
325330
https://github.com/kubernetes/kubernetes/issues/53533#issuecomment-751398086
331+
https://github.com/k0sproject/k0s/issues/3830
326332
- Clusters with nodes \<4Gi memory
327333
https://github.com/kubernetes/kubernetes/issues/53533#issuecomment-751404417
328334

@@ -402,6 +408,13 @@ To address this, we will propose a new field to `MemorySwap` called `NoSwap`. Th
402408

403409
This can address existing use cases where `--fail-swap-on=false` in cgroupv1 and still allow us to turn this feature on.
404410

411+
#### Exhausting swap resource
412+
413+
In previous releases of Swap, we had an `UnlimitedSwap` option for workloads.
414+
This can cause problems where workloads can use up all swap.
415+
If all swap is used up on a node, it can make the node go unhealthy.
416+
To avoid exhausting swap on a node, `UnlimitedSwap` was dropped from the API in beta2.
417+
405418
#### Security risk
406419

407420
Enabling swap on a system without encryption poses a security risk, as critical information, such as Kubernetes secrets, may be swapped out to the disk. If an unauthorized individual gains access to the disk, they could potentially obtain these secrets. To mitigate this risk, it is recommended to use encrypted swap. However, handling encrypted swap is not within the scope of kubelet; rather, it is a general OS configuration concern and should be addressed at that level. Nevertheless, it is essential to provide documentation that warns users of this potential issue, ensuring they are aware of the potential security implications and can take appropriate steps to safeguard their system.
@@ -410,6 +423,11 @@ To guarantee that system daemons are not swapped, the kubelet must configure the
410423

411424
Additionally, end user may decide to disable swap completely for a Pod or a container in beta 1 by making Pod guaranteed or set request == limit for a container. This way, there will be no swap enabled for the corresponding containers and there will be no information exposure risks.
412425

426+
#### Cgroupv1 support
427+
428+
In the early release of this feature, there was a goal to support cgroup v1. As the feature progressed, sig-node realized that supporting swap with cgroup v1 would be very difficult.
429+
Therefore, this feature is limited to cgroupv2 only. The main goal is to deprecate cgroupv1 eventually so this should not be a major inconvience.
430+
413431
## Design Details
414432

415433
We summarize the implementation plan as following:
@@ -435,7 +453,7 @@ Swap can be enabled as follows:
435453
1. Enable the `NodeSwap` feature flag on the kubelet,
436454
1. Set `--fail-on-swap` flag to `false`, and
437455
1. (Optional) Allow Kubernetes workloads to use swap by setting
438-
`MemorySwap.SwapBehavior` to either `LimitedSwap` or `UnlimitedSwap` in the kubelet config.
456+
`MemorySwap.SwapBehavior` to `LimitedSwap` in the kubelet config.
439457

440458
### API Changes
441459

@@ -461,7 +479,6 @@ type MemorySwapConfiguration struct {
461479
// Configure swap memory available to container workloads. May be one of
462480
// "", "NoSwap": workload will not use swap
463481
// "LimitedSwap": workload combined memory and swap usage cannot exceed pod memory limit
464-
// "UnlimitedSwap": workloads can use unlimited swap, up to the allocatable limit.
465482
SwapBehavior string
466483
}
467484
```
@@ -477,12 +494,9 @@ container specification for the `--memory-swap` flag. Thus, the
477494
* With cgroups v2, swap is configured independently from memory. Thus, the
478495
container runtimes can set [`memory.swap.max`] to 0 in this case, and _no_ swap
479496
usage will be permitted.
480-
* If `SwapBehavior` is set to `"UnlimitedSwap"`, the container is allowed to
481-
use unlimited swap, up to the maximum amount available on the host system.
482497
* If `SwapBehavior` is set to `""` or `"NoSwap"`, no workloads will utilize swap.
483498

484499
[docker]: https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details
485-
[`memory.memsw.limit_in_bytes`]: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html
486500
[`memory.swap.max`]: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory
487501

488502
#### CRI Changes
@@ -643,8 +657,8 @@ For beta 1:
643657

644658
#### Alpha
645659

646-
- Kubelet can be started with swap enabled and will support three configurations
647-
for Kubernetes workloads: `LimitedSwap`, `UnlimitedSwap` and `NoSwap`.
660+
- Kubelet can be started with swap enabled and will support two configurations
661+
for Kubernetes workloads: `LimitedSwap` and `NoSwap`.
648662
- Kubelet can configure CRI to allocate swap to Kubernetes workloads. By
649663
default, workloads will not be allocated any swap.
650664
- e2e test jobs are configured for Linux systems with swap enabled.
@@ -688,15 +702,15 @@ Here are specific improvements to be made:
688702

689703
#### Beta 2
690704

691-
- Publish a Kubernetes doc page encoring user to use encrypted swap if they wish to enable this feature.
705+
- Publish a Kubernetes doc page encouraging users to use encrypted swap if they wish to enable this feature.
692706
- Add [swap specific tests](https://github.com/kubernetes/kubernetes/issues/120798) such as, handling the usage of
693707
swap during container restart boundaries for writes to tmpfs (which may require pod cgroup change beyond what
694708
container runtime will do at (container cgroup boundary).
695709
- Fix flaking/failing swap node e2e jobs.
696710
- Address eviction related [issue](https://github.com/kubernetes/kubernetes/issues/120800) in swap implementation.
697711
- Add `NoSwap` as the default setting.
698712
- Add e2e test confirming that `NoSwap` will actually not swap
699-
- Add e2e test confirming that swap is used for `LimitedSwap` and `UnlimitedSwap`
713+
- Add e2e test confirming that swap is used for `LimitedSwap`.
700714
- Document [best practices](#best-practices) for setting up Kubernetes with swap
701715

702716
[via cgroups]: #restrict-swap-usage-at-the-cgroup-level
@@ -830,7 +844,7 @@ been used over time. We propose a configuration in `MemorySwap` called `NoSwap`.
830844
Users could also set `NoSwap` in `MemorySwap` to disable all workloads from
831845
using swap without requiring the user to disable swap if that is needed.
832846

833-
In Beta releases of this feature, one could use turn off `NodeSwap` feature toggle
847+
In Beta releases of this feature, one could use turn off `NodeSwap` feature toggle
834848
but once this feature is GA, users could use another option to disable swap
835849
for workloads.
836850

@@ -1083,8 +1097,8 @@ It is possible for this feature to affect performance of some worker node-level
10831097
SLIs/SLOs. We will need to monitor for differences, particularly during beta
10841098
testing, when evaluating this feature for beta and graduation.
10851099

1086-
We discovered degration of system critical daemons like Kubelet and systemd if swap is enabled.
1087-
We recommend disabling swap for the `system.slice` and setting `io.latency` for the `system.slice`.
1100+
We discovered degration of system critical daemons like Kubelet and systemd if swap is enabled.
1101+
We recommend disabling swap and setting `io.latency` for the `system.slice`.
10881102

10891103
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
10901104

0 commit comments

Comments
 (0)