@@ -164,6 +164,7 @@ cgroupsv2 improved memory management algos, such as oomd, currently require
164
164
swap. Hence, having a small amount of swap available on nodes could improve
165
165
better resource pressure handling and recovery.
166
166
167
+ - https://man7.org/linux/man-pages/man8/systemd-oomd.service.8.html
167
168
- https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#id1
168
169
- https://chrisdown.name/2018/01/02/in-defence-of-swap.html
169
170
- https://media.ccc.de/v/ASG2018-175-oomd
@@ -258,10 +259,10 @@ per-workload parameter, and `swappiness` is optional and can be set globally,
258
259
we are choosing to only expose ` memory-swap ` which will adjust swap available
259
260
to workloads.
260
261
261
- Since we are not currently setting ` memory-swap ` in the CRI, the default
262
- behaviour is to allocate the same amount of swap for a workload as memory
263
- requested. We will update the default to not permit the use of swap by setting
264
- ` memory-swap ` equal to ` limit ` .
262
+ Since we are not currently setting ` memory-swap ` in the CRI, the current
263
+ default behaviour when ` --fail-swap-on=false ` is set is to allocate the same
264
+ amount of swap for a workload as memory requested. We will update the default
265
+ to not permit the use of swap by setting ` memory-swap ` equal to ` limit ` .
265
266
266
267
[ runtime specification ] : https://github.com/opencontainers/runtime-spec/blob/1c3f411f041711bbeecf35ff7e93461ea6789220/config-linux.md#memory
267
268
@@ -347,9 +348,9 @@ type LimitedSwapConfiguration struct {
347
348
}
348
349
```
349
350
350
- The ` MemorySwapConfiguration.SwapBehavior ` setting will have the following
351
- effects, based on the [ Docker ] and open container specification for the
352
- ` --memory-swap ` flag :
351
+ We want to expose all possible swap settings based on the [ Docker ] and open
352
+ container specification for the ` --memory-swap ` flag. Thus, the
353
+ ` MemorySwapConfiguration.SwapBehavior ` setting will have the following effects :
353
354
354
355
* If ` SwapBehavior ` is not set or set to ` "NoSwap" ` , containers do not have
355
356
access to swap. This value effectively prevents a container from using swap,
@@ -361,7 +362,7 @@ effects, based on the [Docker] and open container specification for the
361
362
and swap.
362
363
* If ` SwapBehavior ` is set to ` "UnlimitedSwap" ` , the container is allowed to
363
364
use unlimited swap, up to the maximum amount available on the host system.
364
- * If ` SwapBehavior ` is set to a ` "LimitedSwap" ` , then the ` LimitedSwap `
365
+ * If ` SwapBehavior ` is set to ` "LimitedSwap" ` , then the ` LimitedSwap `
365
366
configuration must also be set. ` LimitedSwap.PerWorkloadMemorySwapLimit `
366
367
represents the system-wide maximum limit for combined memory and swap usage
367
368
of a container. For example, if the limit is set to ` 1Gi ` :
@@ -387,13 +388,9 @@ swap usage in container runtimes. We will introduce a parameter
387
388
// resources.
388
389
message LinuxContainerResources {
389
390
...
390
- // Memory limit in bytes. Default: 0 (not specified).
391
- int64 memory_limit_in_bytes = 4 ;
392
391
// Memory + swap limit in bytes. Default: 0 (not specified).
393
392
int64 memory_swap_limit_in_bytes = 9 ;
394
393
...
395
- // List of HugepageLimits to limit the HugeTLB usage of container per page size. Default: nil (not specified).
396
- repeated HugepageLimit hugepage_limits = 8 ;
397
394
}
398
395
```
399
396
@@ -511,12 +508,16 @@ Pick one of these and delete the rest.
511
508
- [x] Feature gate (also fill in values in ` kep.yaml ` )
512
509
- Feature gate name: NodeSwapEnabled
513
510
- Components depending on the feature gate: API Server, Kubelet
514
- - [ ] Other
515
- - Describe the mechanism:
511
+ - [x] Other
512
+ - Describe the mechanism: ` --fail-swap-on=false ` flag for kubelet must also
513
+ be set at kubelet start
516
514
- Will enabling / disabling the feature require downtime of the control
517
- plane?
515
+ plane? Yes. Flag must be set on kubelet start. To disable, kubelet must be
516
+ restarted. Hence, there would be brief control component downtime on a
517
+ given node.
518
518
- Will enabling / disabling the feature require downtime or reprovisioning
519
519
of a node? (Do not assume ` Dynamic Kubelet Config ` feature is enabled).
520
+ Yes. See above; disabling would require brief node downtime.
520
521
521
522
###### Does enabling the feature change any default behavior?
522
523
@@ -541,10 +542,14 @@ feature, can it break the existing applications?).
541
542
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
542
543
-->
543
544
544
- No. I don’t think it makes much sense to be able to provide users a meaningful
545
- ability to disable the feature flag at runtime, as this would be highly
546
- disruptive to workloads and difficult to implement. To turn this off, the
547
- kubelet would need to be restarted.
545
+ No. The feature flag can be disabled while the ` --fail-swap-on=false ` flag is
546
+ set, but this would result in undefined behaviour.
547
+
548
+ To turn this off, the kubelet would need to be restarted. If a cluster admin
549
+ wants to disable swap on the node without repartitioning the node, they could
550
+ stop the kubelet, set ` swapoff ` on the node, and restart the kubelet with
551
+ ` --fail-swap-on=true ` . The setting of the feature flag will be ignored in this
552
+ case.
548
553
549
554
###### What happens if we reenable the feature if it was previously rolled back?
550
555
0 commit comments