Skip to content

Commit 7e5e13d

Browse files
committed
Move ephemeral storage contents out of container resource page
1 parent 93fcff4 commit 7e5e13d

File tree

2 files changed

+305
-290
lines changed

2 files changed

+305
-290
lines changed
Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
---
2+
title: Local ephemeral storage
3+
content_type: concept
4+
weight: 50
5+
---
6+
7+
## Local ephemeral storage
8+
9+
Nodes have local ephemeral storage, backed by
10+
locally-attached writeable devices or, sometimes, by RAM.
11+
"Ephemeral" means that there is no long-term guarantee about durability.
12+
13+
Pods use ephemeral local storage for scratch space, caching, and for logs.
14+
The kubelet can provide scratch space to Pods using local ephemeral storage to
15+
mount [`emptyDir`](/docs/concepts/storage/volumes/#emptydir)
16+
{{< glossary_tooltip term_id="volume" text="volumes" >}} into containers.
17+
18+
The kubelet also uses this kind of storage to hold
19+
[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level),
20+
container images, and the writable layers of running containers.
21+
22+
{{< caution >}}
23+
If a node fails, the data in its ephemeral storage can be lost.
24+
Your applications cannot expect any performance SLAs (disk IOPS for example)
25+
from local ephemeral storage.
26+
{{< /caution >}}
27+
28+
{{< note >}}
29+
To make the resource quota work on ephemeral-storage, two things need to be done:
30+
31+
* An admin sets the resource quota for ephemeral-storage in a namespace.
32+
* A user needs to specify limits for the ephemeral-storage resource in the Pod spec.
33+
34+
If the user doesn't specify the ephemeral-storage resource limit in the Pod spec,
35+
the resource quota is not enforced on ephemeral-storage.
36+
37+
{{< /note >}}
38+
39+
Kubernetes lets you track, reserve and limit the amount
40+
of ephemeral local storage a Pod can consume.
41+
42+
### Configurations for local ephemeral storage {#configurations}
43+
44+
Kubernetes supports two ways to configure local ephemeral storage on a node:
45+
{{< tabs name="local_storage_configurations" >}}
46+
{{% tab name="Single filesystem" %}}
47+
In this configuration, you place all different kinds of ephemeral local data
48+
(`emptyDir` volumes, writeable layers, container images, logs) into one filesystem.
49+
The most effective way to configure the kubelet means dedicating this filesystem
50+
to Kubernetes (kubelet) data.
51+
52+
The kubelet also writes
53+
[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level)
54+
and treats these similarly to ephemeral local storage.
55+
56+
The kubelet writes logs to files inside its configured log directory (`/var/log`
57+
by default); and has a base directory for other locally stored data
58+
(`/var/lib/kubelet` by default).
59+
60+
Typically, both `/var/lib/kubelet` and `/var/log` are on the system root filesystem,
61+
and the kubelet is designed with that layout in mind.
62+
63+
Your node can have as many other filesystems, not used for Kubernetes,
64+
as you like.
65+
{{% /tab %}}
66+
{{% tab name="Two filesystems" %}}
67+
You have a filesystem on the node that you're using for ephemeral data that
68+
comes from running Pods: logs, and `emptyDir` volumes. You can use this filesystem
69+
for other data (for example: system logs not related to Kubernetes); it can even
70+
be the root filesystem.
71+
72+
The kubelet also writes
73+
[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level)
74+
into the first filesystem, and treats these similarly to ephemeral local storage.
75+
76+
You also use a separate filesystem, backed by a different logical storage device.
77+
In this configuration, the directory where you tell the kubelet to place
78+
container image layers and writeable layers is on this second filesystem.
79+
80+
The first filesystem does not hold any image layers or writeable layers.
81+
82+
Your node can have as many other filesystems, not used for Kubernetes,
83+
as you like.
84+
{{% /tab %}}
85+
{{< /tabs >}}
86+
87+
The kubelet can measure how much local storage it is using. It does this provided
88+
that you have set up the node using one of the supported configurations for local
89+
ephemeral storage.
90+
91+
If you have a different configuration, then the kubelet does not apply resource
92+
limits for ephemeral local storage.
93+
94+
{{< note >}}
95+
The kubelet tracks `tmpfs` emptyDir volumes as container memory use, rather
96+
than as local ephemeral storage.
97+
{{< /note >}}
98+
99+
{{< note >}}
100+
The kubelet will only track the root filesystem for ephemeral storage. OS layouts that mount a separate disk to `/var/lib/kubelet` or `/var/lib/containers` will not report ephemeral storage correctly.
101+
{{< /note >}}
102+
103+
### Setting requests and limits for local ephemeral storage {#requests-limits}
104+
105+
You can specify `ephemeral-storage` for managing local ephemeral storage. Each
106+
container of a Pod can specify either or both of the following:
107+
108+
* `spec.containers[].resources.limits.ephemeral-storage`
109+
* `spec.containers[].resources.requests.ephemeral-storage`
110+
111+
Limits and requests for `ephemeral-storage` are measured in byte quantities.
112+
You can express storage as a plain integer or as a fixed-point number using one of these suffixes:
113+
E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi,
114+
Mi, Ki. For example, the following quantities all represent roughly the same value:
115+
116+
- `128974848`
117+
- `129e6`
118+
- `129M`
119+
- `123Mi`
120+
121+
Pay attention to the case of the suffixes. If you request `400m` of ephemeral-storage, this is a request
122+
for 0.4 bytes. Someone who types that probably meant to ask for 400 mebibytes (`400Mi`)
123+
or 400 megabytes (`400M`).
124+
125+
In the following example, the Pod has two containers. Each container has a request of
126+
2GiB of local ephemeral storage. Each container has a limit of 4GiB of local ephemeral
127+
storage. Therefore, the Pod has a request of 4GiB of local ephemeral storage, and
128+
a limit of 8GiB of local ephemeral storage. 500Mi of that limit could be
129+
consumed by the `emptyDir` volume.
130+
131+
```yaml
132+
apiVersion: v1
133+
kind: Pod
134+
metadata:
135+
name: frontend
136+
spec:
137+
containers:
138+
- name: app
139+
image: images.my-company.example/app:v4
140+
resources:
141+
requests:
142+
ephemeral-storage: "2Gi"
143+
limits:
144+
ephemeral-storage: "4Gi"
145+
volumeMounts:
146+
- name: ephemeral
147+
mountPath: "/tmp"
148+
- name: log-aggregator
149+
image: images.my-company.example/log-aggregator:v6
150+
resources:
151+
requests:
152+
ephemeral-storage: "2Gi"
153+
limits:
154+
ephemeral-storage: "4Gi"
155+
volumeMounts:
156+
- name: ephemeral
157+
mountPath: "/tmp"
158+
volumes:
159+
- name: ephemeral
160+
emptyDir:
161+
sizeLimit: 500Mi
162+
```
163+
164+
### How Pods with ephemeral-storage requests are scheduled
165+
166+
When you create a Pod, the Kubernetes scheduler selects a node for the Pod to
167+
run on. Each node has a maximum amount of local ephemeral storage it can provide for Pods.
168+
For more information, see
169+
[Node Allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable).
170+
171+
The scheduler ensures that the sum of the resource requests of the scheduled containers is less than the capacity of the node.
172+
173+
### Ephemeral storage consumption management {#resource-emphemeralstorage-consumption}
174+
175+
If the kubelet is managing local ephemeral storage as a resource, then the
176+
kubelet measures storage use in:
177+
178+
- `emptyDir` volumes, except _tmpfs_ `emptyDir` volumes
179+
- directories holding node-level logs
180+
- writeable container layers
181+
182+
If a Pod is using more ephemeral storage than you allow it to, the kubelet
183+
sets an eviction signal that triggers Pod eviction.
184+
185+
For container-level isolation, if a container's writable layer and log
186+
usage exceeds its storage limit, the kubelet marks the Pod for eviction.
187+
188+
For pod-level isolation the kubelet works out an overall Pod storage limit by
189+
summing the limits for the containers in that Pod. In this case, if the sum of
190+
the local ephemeral storage usage from all containers and also the Pod's `emptyDir`
191+
volumes exceeds the overall Pod storage limit, then the kubelet also marks the Pod
192+
for eviction.
193+
194+
{{< caution >}}
195+
If the kubelet is not measuring local ephemeral storage, then a Pod
196+
that exceeds its local storage limit will not be evicted for breaching
197+
local storage resource limits.
198+
199+
However, if the filesystem space for writeable container layers, node-level logs,
200+
or `emptyDir` volumes falls low, the node
201+
{{< glossary_tooltip text="taints" term_id="taint" >}} itself as short on local storage
202+
and this taint triggers eviction for any Pods that don't specifically tolerate the taint.
203+
204+
See the supported [configurations](#configurations) for ephemeral local storage.
205+
{{< /caution >}}
206+
207+
The kubelet supports different ways to measure Pod storage use:
208+
209+
{{< tabs name="resource-emphemeralstorage-measurement" >}}
210+
211+
{{% tab name="Periodic scanning" %}}
212+
213+
The kubelet performs regular, scheduled checks that scan each `emptyDir` volume,
214+
container log directory, and writeable container layer.
215+
216+
The scan measures how much space is used.
217+
218+
{{< note >}}
219+
In this mode, the kubelet does not track open file descriptors
220+
for deleted files.
221+
222+
If you (or a container) create a file inside an `emptyDir` volume,
223+
something then opens that file, and you delete the file while it is still open,
224+
then the inode for the deleted file stays until you close that file
225+
but the kubelet does not categorize the space as in use.
226+
227+
{{< /note >}}
228+
229+
{{% /tab %}}
230+
231+
{{% tab name="Filesystem project quota" %}}
232+
233+
{{< feature-state feature_gate_name="LocalStorageCapacityIsolationFSQuotaMonitoring" >}}
234+
235+
Project quotas are an operating-system level feature for managing
236+
storage use on filesystems. With Kubernetes, you can enable project
237+
quotas for monitoring storage use. Make sure that the filesystem
238+
backing the `emptyDir` volumes, on the node, provides project quota support.
239+
For example, XFS and ext4fs offer project quotas.
240+
241+
{{< note >}}
242+
Project quotas let you monitor storage use; they do not enforce limits.
243+
{{< /note >}}
244+
245+
Kubernetes uses project IDs starting from `1048576`. The IDs in use are
246+
registered in `/etc/projects` and `/etc/projid`. If project IDs in
247+
this range are used for other purposes on the system, those project
248+
IDs must be registered in `/etc/projects` and `/etc/projid` so that
249+
Kubernetes does not use them.
250+
251+
Quotas are faster and more accurate than directory scanning.
252+
When a directory is assigned to a project, all files created under a directory
253+
are created in that project, and the kernel merely has to keep track of
254+
how many blocks are in use by files in that project.
255+
If a file is created and deleted, but has an open file descriptor,
256+
it continues to consume space. Quota tracking records that space accurately
257+
whereas directory scans overlook the storage used by deleted files.
258+
259+
To use quotas to track a pod's resource usage, the pod must be in
260+
a user namespace. Within user namespaces, the kernel restricts changes
261+
to projectIDs on the filesystem, ensuring the reliability of storage
262+
metrics calculated by quotas.
263+
264+
If you want to use project quotas, you should:
265+
266+
* Enable the `LocalStorageCapacityIsolationFSQuotaMonitoring=true`
267+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
268+
using the `featureGates` field in the
269+
[kubelet configuration](/docs/reference/config-api/kubelet-config.v1beta1/).
270+
271+
* Ensure the `UserNamespacesSupport`
272+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
273+
is enabled, and that the kernel, CRI implementation and OCI runtime support user namespaces.
274+
275+
* Ensure that the root filesystem (or optional runtime filesystem)
276+
has project quotas enabled. All XFS filesystems support project quotas.
277+
For ext4 filesystems, you need to enable the project quota tracking feature
278+
while the filesystem is not mounted.
279+
280+
```bash
281+
# For ext4, with /dev/block-device not mounted
282+
sudo tune2fs -O project -Q prjquota /dev/block-device
283+
```
284+
285+
* Ensure that the root filesystem (or optional runtime filesystem) is
286+
mounted with project quotas enabled. For both XFS and ext4fs, the
287+
mount option is named `prjquota`.
288+
289+
If you don't want to use project quotas, you should:
290+
291+
* Disable the `LocalStorageCapacityIsolationFSQuotaMonitoring`
292+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
293+
using the `featureGates` field in the
294+
[kubelet configuration](/docs/reference/config-api/kubelet-config.v1beta1/).
295+
{{% /tab %}}
296+
{{< /tabs >}}
297+
298+
299+
## {{% heading "whatsnext" %}}
300+
301+
* Read about [project quotas](https://www.linux.org/docs/man8/xfs_quota.html) in XFS

0 commit comments

Comments
 (0)