|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: 'New in Kubernetes v1.22: alpha support for using swap memory' |
| 4 | +date: 2021-08-09 |
| 5 | +slug: run-nodes-with-swap-alpha |
| 6 | +--- |
| 7 | + |
| 8 | +**Author:** Elana Hashman (Red Hat) |
| 9 | + |
| 10 | +The 1.22 release introduced alpha support for configuring swap memory usage for |
| 11 | +Kubernetes workloads on a per-node basis. |
| 12 | + |
| 13 | +In prior releases, Kubernetes did not support the use of swap memory on Linux, |
| 14 | +as it is difficult to provide guarantees and account for pod memory utilization |
| 15 | +when swap is involved. As part of Kubernetes' earlier design, swap support was |
| 16 | +considered out of scope, and a kubelet would by default fail to start if swap |
| 17 | +was detected on a node. |
| 18 | + |
| 19 | +However, there are a number of [use cases](https://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README.md#user-stories) |
| 20 | +that would benefit from Kubernetes nodes supporting swap, including improved |
| 21 | +node stability, better support for applications with high memory overhead but |
| 22 | +smaller working sets, the use of memory-constrained devices, and memory |
| 23 | +flexibility. |
| 24 | + |
| 25 | +Hence, over the past two releases, [SIG Node](https://github.com/kubernetes/community/tree/master/sig-node#readme) has |
| 26 | +been working to gather appropriate use cases and feedback, and propose a design |
| 27 | +for adding swap support to nodes in a controlled, predictable manner so that |
| 28 | +Kubernetes users can perform testing and provide data to continue building |
| 29 | +cluster capabilities on top of swap. The alpha graduation of swap memory |
| 30 | +support for nodes is our first milestone towards this goal! |
| 31 | + |
| 32 | +## How does it work? |
| 33 | + |
| 34 | +There are a number of possible ways that one could envision swap use on a node. |
| 35 | +To keep the scope manageable for this initial implementation, when swap is |
| 36 | +already provisioned and available on a node, [we have proposed](https://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README.md#proposal) |
| 37 | +the kubelet should be able to be configured such that: |
| 38 | + |
| 39 | +- It can start with swap on. |
| 40 | +- It will direct the Container Runtime Interface to allocate zero swap memory |
| 41 | + to Kubernetes workloads by default. |
| 42 | +- You can configure the kubelet to specify swap utilization for the entire |
| 43 | + node. |
| 44 | + |
| 45 | +Swap configuration on a node is exposed to a cluster admin via the |
| 46 | +[`memorySwap` in the KubeletConfiguration](/docs/reference/config-api/kubelet-config.v1beta1/). |
| 47 | +As a cluster administrator, you can specify the node's behaviour in the |
| 48 | +presence of swap memory by setting `memorySwap.swapBehavior`. |
| 49 | + |
| 50 | +This is possible through the addition of a `memory_swap_limit_in_bytes` field |
| 51 | +to the container runtime interface (CRI). The kubelet's config will control how |
| 52 | +much swap memory the kubelet instructs the container runtime to allocate to |
| 53 | +each container via the CRI. The container runtime will then write the swap |
| 54 | +settings to the container level cgroup. |
| 55 | + |
| 56 | +## How do I use it? |
| 57 | + |
| 58 | +On a node where swap memory is already provisioned, Kubernetes use of swap on a |
| 59 | +node can be enabled by enabling the `NodeSwap` feature gate on the kubelet, and |
| 60 | +disabling the `failSwapOn` [configuration setting](/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration) |
| 61 | +or the `--fail-swap-on` command line flag. |
| 62 | + |
| 63 | +You can also optionally configure `memorySwap.swapBehavior` in order to |
| 64 | +specify how a node will use swap memory. For example, |
| 65 | + |
| 66 | +```yaml |
| 67 | +memorySwap: |
| 68 | + swapBehavior: LimitedSwap |
| 69 | +``` |
| 70 | +
|
| 71 | +The available configuration options for `swapBehavior` are: |
| 72 | + |
| 73 | +- `LimitedSwap` (default): Kubernetes workloads are limited in how much swap |
| 74 | + they can use. Workloads on the node not managed by Kubernetes can still swap. |
| 75 | +- `UnlimitedSwap`: Kubernetes workloads can use as much swap memory as they |
| 76 | + request, up to the system limit. |
| 77 | + |
| 78 | +If configuration for `memorySwap` is not specified and the feature gate is |
| 79 | +enabled, by default the kubelet will apply the same behaviour as the |
| 80 | +`LimitedSwap` setting. |
| 81 | + |
| 82 | +The behaviour of the `LimitedSwap` setting depends if the node is running with |
| 83 | +v1 or v2 of control groups (also known as "cgroups"): |
| 84 | + |
| 85 | +- **cgroups v1:** Kubernetes workloads can use any combination of memory and |
| 86 | + swap, up to the pod's memory limit, if set. |
| 87 | +- **cgroups v2:** Kubernetes workloads cannot use swap memory. |
| 88 | + |
| 89 | +### Caveats |
| 90 | + |
| 91 | +Having swap available on a system reduces predictability. Swap's performance is |
| 92 | +worse than regular memory, sometimes by many orders of magnitude, which can |
| 93 | +cause unexpected performance regressions. Furthermore, swap changes a system's |
| 94 | +behaviour under memory pressure, and applications cannot directly control what |
| 95 | +portions of their memory usage are swapped out. Since enabling swap permits |
| 96 | +greater memory usage for workloads in Kubernetes that cannot be predictably |
| 97 | +accounted for, it also increases the risk of noisy neighbours and unexpected |
| 98 | +packing configurations, as the scheduler cannot account for swap memory usage. |
| 99 | + |
| 100 | +The performance of a node with swap memory enabled depends on the underlying |
| 101 | +physical storage. When swap memory is in use, performance will be significantly |
| 102 | +worse in an I/O operations per second (IOPS) constrained environment, such as a |
| 103 | +cloud VM with I/O throttling, when compared to faster storage mediums like |
| 104 | +solid-state drives or NVMe. |
| 105 | + |
| 106 | +Hence, we do not recommend the use of swap for certain performance-constrained |
| 107 | +workloads or environments. Cluster administrators and developers should |
| 108 | +benchmark their nodes and applications before using swap in production |
| 109 | +scenarios, and [we need your help](#how-do-i-get-involved) with that! |
| 110 | + |
| 111 | +## Looking ahead |
| 112 | + |
| 113 | +The Kubernetes 1.22 release introduces alpha support for swap memory on nodes, |
| 114 | +and we will continue to work towards beta graduation in the 1.23 release. This |
| 115 | +will include: |
| 116 | + |
| 117 | +* Adding support for controlling swap consumption at the Pod level via cgroups. |
| 118 | + * This will include the ability to set a system-reserved quantity of swap |
| 119 | + from what kubelet detects on the host. |
| 120 | +* Determining a set of metrics for node QoS in order to evaluate the |
| 121 | + performance and stability of nodes with and without swap enabled. |
| 122 | +* Collecting feedback from test user cases. |
| 123 | + * We will consider introducing new configuration modes for swap, such as a |
| 124 | + node-wide swap limit for workloads. |
| 125 | + |
| 126 | +## How can I learn more? |
| 127 | + |
| 128 | +You can review the current [documentation](https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory) |
| 129 | +on the Kubernetes website. |
| 130 | + |
| 131 | +For more information, and to assist with testing and provide feedback, please |
| 132 | +see [KEP-2400](https://github.com/kubernetes/enhancements/issues/2400) and its |
| 133 | +[design proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md). |
| 134 | + |
| 135 | +## How do I get involved? |
| 136 | + |
| 137 | +Your feedback is always welcome! SIG Node [meets regularly](https://github.com/kubernetes/community/tree/master/sig-node#meetings) |
| 138 | +and [can be reached](https://github.com/kubernetes/community/tree/master/sig-node#contact) |
| 139 | +via [Slack](https://slack.k8s.io/) (channel **#sig-node**), or the SIG's |
| 140 | +[mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node). |
| 141 | +Feel free to reach out to me, Elana Hashman (**@ehashman** on Slack and GitHub) |
| 142 | +if you'd like to help. |
0 commit comments