Skip to content

Commit 9f89239

Browse files
authored
Swap graduates to Beta1: add a blog-post (#42009)
* Add a blog-post for Swap graduating to Beta Signed-off-by: Itamar Holder <[email protected]> * Add instructions to deploy a minimal cluster with swap enabled Signed-off-by: Itamar Holder <[email protected]> * Update publish date and filename Signed-off-by: Itamar Holder <[email protected]> --------- Signed-off-by: Itamar Holder <[email protected]>
1 parent 1914f44 commit 9f89239

File tree

1 file changed

+248
-0
lines changed

1 file changed

+248
-0
lines changed
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.28: Beta support for using swap on Linux"
4+
date: 2023-08-24T10:00:00-08:00
5+
slug: swap-linux-beta
6+
---
7+
8+
**Author:** Itamar Holder (Red Hat)
9+
10+
The 1.22 release [introduced Alpha support](/blog/2021/08/09/run-nodes-with-swap-alpha/)
11+
for configuring swap memory usage for Kubernetes workloads running on Linux on a per-node basis.
12+
Now, in release 1.28, support for swap on Linux nodes has graduated to Beta, along with many
13+
new improvements.
14+
15+
Prior to version 1.22, Kubernetes did not provide support for swap memory on Linux systems.
16+
This was due to the inherent difficulty in guaranteeing and accounting for pod memory utilization
17+
when swap memory was involved. As a result, swap support was deemed out of scope in the initial
18+
design of Kubernetes, and the default behavior of a kubelet was to fail to start if swap memory
19+
was detected on a node.
20+
21+
In version 1.22, the swap feature for Linux was initially introduced in its Alpha stage. This represented
22+
a significant advancement, providing Linux users with the opportunity to experiment with the swap
23+
feature for the first time. However, as an Alpha version, it was not fully developed and had
24+
several issues, including inadequate support for cgroup v2, insufficient metrics and summary
25+
API statistics, inadequate testing, and more.
26+
27+
Swap in Kubernetes has numerous [use cases](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#user-stories)
28+
for a wide range of users. As a result, the node special interest group within the Kubernetes project
29+
has invested significant effort into supporting swap on Linux nodes for beta.
30+
Compared to the alpha, the kubelet's support for running with swap enabled is more stable and
31+
robust, more user-friendly, and addresses many known shortcomings. This graduation to beta
32+
represents a crucial step towards achieving the goal of fully supporting swap in Kubernetes.
33+
34+
## How do I use it?
35+
36+
The utilization of swap memory on a node where it has already been provisioned can be
37+
facilitated by the activation of the `NodeSwap` feature gate on the kubelet.
38+
Additionally, you must disable the `failSwapOn` configuration setting, or the deprecated
39+
`--fail-swap-on` command line flag must be deactivated.
40+
41+
It is possible to configure the `memorySwap.swapBehavior` option to define the manner in which a node utilizes swap memory. For instance,
42+
43+
```yaml
44+
# this fragment goes into the kubelet's configuration file
45+
memorySwap:
46+
swapBehavior: UnlimitedSwap
47+
```
48+
49+
The available configuration options for `swapBehavior` are:
50+
- `UnlimitedSwap` (default): Kubernetes workloads can use as much swap memory as they
51+
request, up to the system limit.
52+
- `LimitedSwap`: The utilization of swap memory by Kubernetes workloads is subject to limitations.
53+
Only Pods of [Burstable](docs/concepts/workloads/pods/pod-qos/#burstable) QoS are permitted to employ swap.
54+
55+
If configuration for `memorySwap` is not specified and the feature gate is
56+
enabled, by default the kubelet will apply the same behaviour as the
57+
`UnlimitedSwap` setting.
58+
59+
Note that `NodeSwap` is supported for **cgroup v2** only. For Kubernetes v1.28,
60+
using swap along with cgroup v1 is no longer supported.
61+
62+
## Install a swap-enabled cluster with kubeadm
63+
64+
### Before you begin
65+
66+
It is required for this demo that the kubeadm tool be installed, following the steps outlined in the
67+
[kubeadm installation guide](/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm).
68+
If swap is already enabled on the node, cluster creation may
69+
proceed. If swap is not enabled, please refer to the provided instructions for enabling swap.
70+
71+
### Create a swap file and turn swap on
72+
73+
I'll demonstrate creating 4GiB of unencrypted swap.
74+
75+
```bash
76+
dd if=/dev/zero of=/swapfile bs=128M count=32
77+
chmod 600 /swapfile
78+
mkswap /swapfile
79+
swapon /swapfile
80+
swapon -s # enable the swap file only until this node is rebooted
81+
```
82+
83+
To start the swap file at boot time, add line like `/swapfile swap swap defaults 0 0` to `/etc/fstab` file.
84+
85+
### Set up a Kubernetes cluster that uses swap-enabled nodes
86+
87+
To make things clearer, here is an example kubeadm configuration file `kubeadm-config.yaml` for the swap enabled cluster.
88+
89+
```yaml
90+
---
91+
apiVersion: "kubeadm.k8s.io/v1beta3"
92+
kind: InitConfiguration
93+
---
94+
apiVersion: kubelet.config.k8s.io/v1beta1
95+
kind: KubeletConfiguration
96+
failSwapOn: false
97+
featureGates:
98+
NodeSwap: true
99+
memorySwap:
100+
swapBehavior: LimitedSwap
101+
```
102+
103+
Then create a single-node cluster using `kubeadm init --config kubeadm-config.yaml`.
104+
During init, there is a warning that swap is enabled on the node and in case the kubelet
105+
`failSwapOn` is set to true. We plan to remove this warning in a future release.
106+
107+
## How is the swap limit being determined with LimitedSwap?
108+
109+
The configuration of swap memory, including its limitations, presents a significant
110+
challenge. Not only is it prone to misconfiguration, but as a system-level property, any
111+
misconfiguration could potentially compromise the entire node rather than just a specific
112+
workload. To mitigate this risk and ensure the health of the node, we have implemented
113+
Swap in Beta with automatic configuration of limitations.
114+
115+
With `LimitedSwap`, Pods that do not fall under the Burstable QoS classification (i.e.
116+
`BestEffort`/`Guaranteed` Qos Pods) are prohibited from utilizing swap memory.
117+
`BestEffort` QoS Pods exhibit unpredictable memory consumption patterns and lack
118+
information regarding their memory usage, making it difficult to determine a safe
119+
allocation of swap memory. Conversely, `Guaranteed` QoS Pods are typically employed for
120+
applications that rely on the precise allocation of resources specified by the workload,
121+
with memory being immediately available. To maintain the aforementioned security and node
122+
health guarantees, these Pods are not permitted to use swap memory when `LimitedSwap` is
123+
in effect.
124+
125+
Prior to detailing the calculation of the swap limit, it is necessary to define the following terms:
126+
* `nodeTotalMemory`: The total amount of physical memory available on the node.
127+
* `totalPodsSwapAvailable`: The total amount of swap memory on the node that is available for use by Pods (some swap memory may be reserved for system use).
128+
* `containerMemoryRequest`: The container's memory request.
129+
130+
Swap limitation is configured as:
131+
`(containerMemoryRequest / nodeTotalMemory) × totalPodsSwapAvailable`
132+
133+
In other words, the amount of swap that a container is able to use is proportionate to its
134+
memory request, the node's total physical memory and the total amount of swap memory on
135+
the node that is available for use by Pods.
136+
137+
It is important to note that, for containers within Burstable QoS Pods, it is possible to
138+
opt-out of swap usage by specifying memory requests that are equal to memory limits.
139+
Containers configured in this manner will not have access to swap memory.
140+
141+
## How does it work?
142+
143+
There are a number of possible ways that one could envision swap use on a node.
144+
When swap is already provisioned and available on a node,
145+
SIG Node have [proposed](https://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README.md#proposal)
146+
the kubelet should be able to be configured so that:
147+
- It can start with swap on.
148+
- It will direct the Container Runtime Interface to allocate zero swap memory
149+
to Kubernetes workloads by default.
150+
151+
Swap configuration on a node is exposed to a cluster admin via the
152+
[`memorySwap` in the KubeletConfiguration](/docs/reference/config-api/kubelet-config.v1).
153+
As a cluster administrator, you can specify the node's behaviour in the
154+
presence of swap memory by setting `memorySwap.swapBehavior`.
155+
156+
The kubelet [employs the CRI](https://kubernetes.io/docs/concepts/architecture/cri/)
157+
(container runtime interface) API to direct the CRI to
158+
configure specific cgroup v2 parameters (such as `memory.swap.max`) in a manner that will
159+
enable the desired swap configuration for a container. The CRI is then responsible to
160+
write these settings to the container-level cgroup.
161+
162+
## How can I monitor swap?
163+
164+
A notable deficiency in the Alpha version was the inability to monitor and introspect swap
165+
usage. This issue has been addressed in the Beta version introduced in Kubernetes 1.28, which now
166+
provides the capability to monitor swap usage through several different methods.
167+
168+
The beta version of kubelet now collects
169+
[node-level metric statistics](/docs/reference/instrumentation/node-metrics/),
170+
which can be accessed at the `/metrics/resource` and `/stats/summary` kubelet HTTP endpoints.
171+
This allows clients who can directly interrogate the kubelet to
172+
monitor swap usage and remaining swap memory when using LimitedSwap. Additionally, a
173+
`machine_swap_bytes` metric has been added to cadvisor to show the total physical swap capacity of the
174+
machine.
175+
176+
## Caveats
177+
178+
Having swap available on a system reduces predictability. Swap's performance is
179+
worse than regular memory, sometimes by many orders of magnitude, which can
180+
cause unexpected performance regressions. Furthermore, swap changes a system's
181+
behaviour under memory pressure. Since enabling swap permits
182+
greater memory usage for workloads in Kubernetes that cannot be predictably
183+
accounted for, it also increases the risk of noisy neighbours and unexpected
184+
packing configurations, as the scheduler cannot account for swap memory usage.
185+
186+
The performance of a node with swap memory enabled depends on the underlying
187+
physical storage. When swap memory is in use, performance will be significantly
188+
worse in an I/O operations per second (IOPS) constrained environment, such as a
189+
cloud VM with I/O throttling, when compared to faster storage mediums like
190+
solid-state drives or NVMe.
191+
192+
As such, we do not advocate the utilization of swap memory for workloads or
193+
environments that are subject to performance constraints. Furthermore, it is
194+
recommended to employ `LimitedSwap`, as this significantly mitigates the risks
195+
posed to the node.
196+
197+
Cluster administrators and developers should benchmark their nodes and applications
198+
before using swap in production scenarios, and [we need your help](#how-do-i-get-involved) with that!
199+
200+
### Security risk
201+
202+
Enabling swap on a system without encryption poses a security risk, as critical information,
203+
such as volumes that represent Kubernetes Secrets, [may be swapped out to the disk](/docs/concepts/configuration/secret/#information-security-for-secrets).
204+
If an unauthorized individual gains
205+
access to the disk, they could potentially obtain these confidential data. To mitigate this risk, the
206+
Kubernetes project strongly recommends that you encrypt your swap space.
207+
However, handling encrypted swap is not within the scope of
208+
kubelet; rather, it is a general OS configuration concern and should be addressed at that level.
209+
It is the administrator's responsibility to provision encrypted swap to mitigate this risk.
210+
211+
Furthermore, as previously mentioned, with `LimitedSwap` the user has the option to completely
212+
disable swap usage for a container by specifying memory requests that are equal to memory limits.
213+
This will prevent the corresponding containers from accessing swap memory.
214+
215+
## Looking ahead
216+
217+
The Kubernetes 1.28 release introduced Beta support for swap memory on Linux nodes,
218+
and we will continue to work towards [general availability](/docs/reference/command-line-tools-reference/feature-gates/#feature-stages)
219+
for this feature. I hope that this will include:
220+
221+
* Add the ability to set a system-reserved quantity of swap from what kubelet detects on the host.
222+
* Adding support for controlling swap consumption at the Pod level via cgroups.
223+
* This point is still under discussion.
224+
* Collecting feedback from test user cases.
225+
* We will consider introducing new configuration modes for swap, such as a
226+
node-wide swap limit for workloads.
227+
228+
## How can I learn more?
229+
230+
You can review the current [documentation](/docs/concepts/architecture/nodes/#swap-memory)
231+
for using swap with Kubernetes.
232+
233+
For more information, and to assist with testing and provide feedback, please
234+
see [KEP-2400](https://github.com/kubernetes/enhancements/issues/4128) and its
235+
[design proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md).
236+
237+
## How do I get involved?
238+
239+
Your feedback is always welcome! SIG Node [meets regularly](https://github.com/kubernetes/community/tree/master/sig-node#meetings)
240+
and [can be reached](https://github.com/kubernetes/community/tree/master/sig-node#contact)
241+
via [Slack](https://slack.k8s.io/) (channel **#sig-node**), or the SIG's
242+
[mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node). A Slack
243+
channel dedicated to swap is also available at **#sig-node-swap**.
244+
245+
Feel free to reach out to me, Itamar Holder (**@iholder101** on Slack and GitHub)
246+
if you'd like to help or ask further questions.
247+
248+

0 commit comments

Comments
 (0)