|
| 1 | +--- |
| 2 | +title: Local ephemeral storage |
| 3 | +content_type: concept |
| 4 | +weight: 50 |
| 5 | +--- |
| 6 | + |
| 7 | +## Local ephemeral storage |
| 8 | + |
| 9 | +Nodes have local ephemeral storage, backed by |
| 10 | +locally-attached writeable devices or, sometimes, by RAM. |
| 11 | +"Ephemeral" means that there is no long-term guarantee about durability. |
| 12 | + |
| 13 | +Pods use ephemeral local storage for scratch space, caching, and for logs. |
| 14 | +The kubelet can provide scratch space to Pods using local ephemeral storage to |
| 15 | +mount [`emptyDir`](/docs/concepts/storage/volumes/#emptydir) |
| 16 | + {{< glossary_tooltip term_id="volume" text="volumes" >}} into containers. |
| 17 | + |
| 18 | +The kubelet also uses this kind of storage to hold |
| 19 | +[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level), |
| 20 | +container images, and the writable layers of running containers. |
| 21 | + |
| 22 | +{{< caution >}} |
| 23 | +If a node fails, the data in its ephemeral storage can be lost. |
| 24 | +Your applications cannot expect any performance SLAs (disk IOPS for example) |
| 25 | +from local ephemeral storage. |
| 26 | +{{< /caution >}} |
| 27 | + |
| 28 | +{{< note >}} |
| 29 | +To make the resource quota work on ephemeral-storage, two things need to be done: |
| 30 | + |
| 31 | +* An admin sets the resource quota for ephemeral-storage in a namespace. |
| 32 | +* A user needs to specify limits for the ephemeral-storage resource in the Pod spec. |
| 33 | + |
| 34 | +If the user doesn't specify the ephemeral-storage resource limit in the Pod spec, |
| 35 | +the resource quota is not enforced on ephemeral-storage. |
| 36 | + |
| 37 | +{{< /note >}} |
| 38 | + |
| 39 | +Kubernetes lets you track, reserve and limit the amount |
| 40 | +of ephemeral local storage a Pod can consume. |
| 41 | + |
| 42 | +### Configurations for local ephemeral storage {#configurations} |
| 43 | + |
| 44 | +Kubernetes supports two ways to configure local ephemeral storage on a node: |
| 45 | +{{< tabs name="local_storage_configurations" >}} |
| 46 | +{{% tab name="Single filesystem" %}} |
| 47 | +In this configuration, you place all different kinds of ephemeral local data |
| 48 | +(`emptyDir` volumes, writeable layers, container images, logs) into one filesystem. |
| 49 | +The most effective way to configure the kubelet means dedicating this filesystem |
| 50 | +to Kubernetes (kubelet) data. |
| 51 | + |
| 52 | +The kubelet also writes |
| 53 | +[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level) |
| 54 | +and treats these similarly to ephemeral local storage. |
| 55 | + |
| 56 | +The kubelet writes logs to files inside its configured log directory (`/var/log` |
| 57 | +by default); and has a base directory for other locally stored data |
| 58 | +(`/var/lib/kubelet` by default). |
| 59 | + |
| 60 | +Typically, both `/var/lib/kubelet` and `/var/log` are on the system root filesystem, |
| 61 | +and the kubelet is designed with that layout in mind. |
| 62 | + |
| 63 | +Your node can have as many other filesystems, not used for Kubernetes, |
| 64 | +as you like. |
| 65 | +{{% /tab %}} |
| 66 | +{{% tab name="Two filesystems" %}} |
| 67 | +You have a filesystem on the node that you're using for ephemeral data that |
| 68 | +comes from running Pods: logs, and `emptyDir` volumes. You can use this filesystem |
| 69 | +for other data (for example: system logs not related to Kubernetes); it can even |
| 70 | +be the root filesystem. |
| 71 | + |
| 72 | +The kubelet also writes |
| 73 | +[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level) |
| 74 | +into the first filesystem, and treats these similarly to ephemeral local storage. |
| 75 | + |
| 76 | +You also use a separate filesystem, backed by a different logical storage device. |
| 77 | +In this configuration, the directory where you tell the kubelet to place |
| 78 | +container image layers and writeable layers is on this second filesystem. |
| 79 | + |
| 80 | +The first filesystem does not hold any image layers or writeable layers. |
| 81 | + |
| 82 | +Your node can have as many other filesystems, not used for Kubernetes, |
| 83 | +as you like. |
| 84 | +{{% /tab %}} |
| 85 | +{{< /tabs >}} |
| 86 | + |
| 87 | +The kubelet can measure how much local storage it is using. It does this provided |
| 88 | +that you have set up the node using one of the supported configurations for local |
| 89 | +ephemeral storage. |
| 90 | + |
| 91 | +If you have a different configuration, then the kubelet does not apply resource |
| 92 | +limits for ephemeral local storage. |
| 93 | + |
| 94 | +{{< note >}} |
| 95 | +The kubelet tracks `tmpfs` emptyDir volumes as container memory use, rather |
| 96 | +than as local ephemeral storage. |
| 97 | +{{< /note >}} |
| 98 | + |
| 99 | +{{< note >}} |
| 100 | +The kubelet will only track the root filesystem for ephemeral storage. OS layouts that mount a separate disk to `/var/lib/kubelet` or `/var/lib/containers` will not report ephemeral storage correctly. |
| 101 | +{{< /note >}} |
| 102 | + |
| 103 | +### Setting requests and limits for local ephemeral storage {#requests-limits} |
| 104 | + |
| 105 | +You can specify `ephemeral-storage` for managing local ephemeral storage. Each |
| 106 | +container of a Pod can specify either or both of the following: |
| 107 | + |
| 108 | +* `spec.containers[].resources.limits.ephemeral-storage` |
| 109 | +* `spec.containers[].resources.requests.ephemeral-storage` |
| 110 | + |
| 111 | +Limits and requests for `ephemeral-storage` are measured in byte quantities. |
| 112 | +You can express storage as a plain integer or as a fixed-point number using one of these suffixes: |
| 113 | +E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, |
| 114 | +Mi, Ki. For example, the following quantities all represent roughly the same value: |
| 115 | + |
| 116 | +- `128974848` |
| 117 | +- `129e6` |
| 118 | +- `129M` |
| 119 | +- `123Mi` |
| 120 | + |
| 121 | +Pay attention to the case of the suffixes. If you request `400m` of ephemeral-storage, this is a request |
| 122 | +for 0.4 bytes. Someone who types that probably meant to ask for 400 mebibytes (`400Mi`) |
| 123 | +or 400 megabytes (`400M`). |
| 124 | + |
| 125 | +In the following example, the Pod has two containers. Each container has a request of |
| 126 | +2GiB of local ephemeral storage. Each container has a limit of 4GiB of local ephemeral |
| 127 | +storage. Therefore, the Pod has a request of 4GiB of local ephemeral storage, and |
| 128 | +a limit of 8GiB of local ephemeral storage. 500Mi of that limit could be |
| 129 | +consumed by the `emptyDir` volume. |
| 130 | + |
| 131 | +```yaml |
| 132 | +apiVersion: v1 |
| 133 | +kind: Pod |
| 134 | +metadata: |
| 135 | + name: frontend |
| 136 | +spec: |
| 137 | + containers: |
| 138 | + - name: app |
| 139 | + image: images.my-company.example/app:v4 |
| 140 | + resources: |
| 141 | + requests: |
| 142 | + ephemeral-storage: "2Gi" |
| 143 | + limits: |
| 144 | + ephemeral-storage: "4Gi" |
| 145 | + volumeMounts: |
| 146 | + - name: ephemeral |
| 147 | + mountPath: "/tmp" |
| 148 | + - name: log-aggregator |
| 149 | + image: images.my-company.example/log-aggregator:v6 |
| 150 | + resources: |
| 151 | + requests: |
| 152 | + ephemeral-storage: "2Gi" |
| 153 | + limits: |
| 154 | + ephemeral-storage: "4Gi" |
| 155 | + volumeMounts: |
| 156 | + - name: ephemeral |
| 157 | + mountPath: "/tmp" |
| 158 | + volumes: |
| 159 | + - name: ephemeral |
| 160 | + emptyDir: |
| 161 | + sizeLimit: 500Mi |
| 162 | +``` |
| 163 | +
|
| 164 | +### How Pods with ephemeral-storage requests are scheduled |
| 165 | +
|
| 166 | +When you create a Pod, the Kubernetes scheduler selects a node for the Pod to |
| 167 | +run on. Each node has a maximum amount of local ephemeral storage it can provide for Pods. |
| 168 | +For more information, see |
| 169 | +[Node Allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable). |
| 170 | +
|
| 171 | +The scheduler ensures that the sum of the resource requests of the scheduled containers is less than the capacity of the node. |
| 172 | +
|
| 173 | +### Ephemeral storage consumption management {#resource-emphemeralstorage-consumption} |
| 174 | +
|
| 175 | +If the kubelet is managing local ephemeral storage as a resource, then the |
| 176 | +kubelet measures storage use in: |
| 177 | +
|
| 178 | +- `emptyDir` volumes, except _tmpfs_ `emptyDir` volumes |
| 179 | +- directories holding node-level logs |
| 180 | +- writeable container layers |
| 181 | + |
| 182 | +If a Pod is using more ephemeral storage than you allow it to, the kubelet |
| 183 | +sets an eviction signal that triggers Pod eviction. |
| 184 | + |
| 185 | +For container-level isolation, if a container's writable layer and log |
| 186 | +usage exceeds its storage limit, the kubelet marks the Pod for eviction. |
| 187 | + |
| 188 | +For pod-level isolation the kubelet works out an overall Pod storage limit by |
| 189 | +summing the limits for the containers in that Pod. In this case, if the sum of |
| 190 | +the local ephemeral storage usage from all containers and also the Pod's `emptyDir` |
| 191 | +volumes exceeds the overall Pod storage limit, then the kubelet also marks the Pod |
| 192 | +for eviction. |
| 193 | + |
| 194 | +{{< caution >}} |
| 195 | +If the kubelet is not measuring local ephemeral storage, then a Pod |
| 196 | +that exceeds its local storage limit will not be evicted for breaching |
| 197 | +local storage resource limits. |
| 198 | + |
| 199 | +However, if the filesystem space for writeable container layers, node-level logs, |
| 200 | +or `emptyDir` volumes falls low, the node |
| 201 | +{{< glossary_tooltip text="taints" term_id="taint" >}} itself as short on local storage |
| 202 | +and this taint triggers eviction for any Pods that don't specifically tolerate the taint. |
| 203 | + |
| 204 | +See the supported [configurations](#configurations) for ephemeral local storage. |
| 205 | +{{< /caution >}} |
| 206 | + |
| 207 | +The kubelet supports different ways to measure Pod storage use: |
| 208 | + |
| 209 | +{{< tabs name="resource-emphemeralstorage-measurement" >}} |
| 210 | + |
| 211 | +{{% tab name="Periodic scanning" %}} |
| 212 | + |
| 213 | +The kubelet performs regular, scheduled checks that scan each `emptyDir` volume, |
| 214 | +container log directory, and writeable container layer. |
| 215 | + |
| 216 | +The scan measures how much space is used. |
| 217 | + |
| 218 | +{{< note >}} |
| 219 | +In this mode, the kubelet does not track open file descriptors |
| 220 | +for deleted files. |
| 221 | + |
| 222 | +If you (or a container) create a file inside an `emptyDir` volume, |
| 223 | +something then opens that file, and you delete the file while it is still open, |
| 224 | +then the inode for the deleted file stays until you close that file |
| 225 | +but the kubelet does not categorize the space as in use. |
| 226 | + |
| 227 | +{{< /note >}} |
| 228 | + |
| 229 | +{{% /tab %}} |
| 230 | + |
| 231 | +{{% tab name="Filesystem project quota" %}} |
| 232 | + |
| 233 | +{{< feature-state feature_gate_name="LocalStorageCapacityIsolationFSQuotaMonitoring" >}} |
| 234 | + |
| 235 | +Project quotas are an operating-system level feature for managing |
| 236 | +storage use on filesystems. With Kubernetes, you can enable project |
| 237 | +quotas for monitoring storage use. Make sure that the filesystem |
| 238 | +backing the `emptyDir` volumes, on the node, provides project quota support. |
| 239 | +For example, XFS and ext4fs offer project quotas. |
| 240 | + |
| 241 | +{{< note >}} |
| 242 | +Project quotas let you monitor storage use; they do not enforce limits. |
| 243 | +{{< /note >}} |
| 244 | + |
| 245 | +Kubernetes uses project IDs starting from `1048576`. The IDs in use are |
| 246 | +registered in `/etc/projects` and `/etc/projid`. If project IDs in |
| 247 | +this range are used for other purposes on the system, those project |
| 248 | +IDs must be registered in `/etc/projects` and `/etc/projid` so that |
| 249 | +Kubernetes does not use them. |
| 250 | + |
| 251 | +Quotas are faster and more accurate than directory scanning. |
| 252 | +When a directory is assigned to a project, all files created under a directory |
| 253 | +are created in that project, and the kernel merely has to keep track of |
| 254 | +how many blocks are in use by files in that project. |
| 255 | +If a file is created and deleted, but has an open file descriptor, |
| 256 | +it continues to consume space. Quota tracking records that space accurately |
| 257 | +whereas directory scans overlook the storage used by deleted files. |
| 258 | + |
| 259 | +To use quotas to track a pod's resource usage, the pod must be in |
| 260 | +a user namespace. Within user namespaces, the kernel restricts changes |
| 261 | +to projectIDs on the filesystem, ensuring the reliability of storage |
| 262 | +metrics calculated by quotas. |
| 263 | + |
| 264 | +If you want to use project quotas, you should: |
| 265 | + |
| 266 | +* Enable the `LocalStorageCapacityIsolationFSQuotaMonitoring=true` |
| 267 | + [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) |
| 268 | + using the `featureGates` field in the |
| 269 | + [kubelet configuration](/docs/reference/config-api/kubelet-config.v1beta1/). |
| 270 | + |
| 271 | +* Ensure the `UserNamespacesSupport` |
| 272 | + [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) |
| 273 | + is enabled, and that the kernel, CRI implementation and OCI runtime support user namespaces. |
| 274 | + |
| 275 | +* Ensure that the root filesystem (or optional runtime filesystem) |
| 276 | + has project quotas enabled. All XFS filesystems support project quotas. |
| 277 | + For ext4 filesystems, you need to enable the project quota tracking feature |
| 278 | + while the filesystem is not mounted. |
| 279 | + |
| 280 | + ```bash |
| 281 | + # For ext4, with /dev/block-device not mounted |
| 282 | + sudo tune2fs -O project -Q prjquota /dev/block-device |
| 283 | + ``` |
| 284 | + |
| 285 | +* Ensure that the root filesystem (or optional runtime filesystem) is |
| 286 | + mounted with project quotas enabled. For both XFS and ext4fs, the |
| 287 | + mount option is named `prjquota`. |
| 288 | + |
| 289 | +If you don't want to use project quotas, you should: |
| 290 | + |
| 291 | +* Disable the `LocalStorageCapacityIsolationFSQuotaMonitoring` |
| 292 | + [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) |
| 293 | + using the `featureGates` field in the |
| 294 | + [kubelet configuration](/docs/reference/config-api/kubelet-config.v1beta1/). |
| 295 | +{{% /tab %}} |
| 296 | +{{< /tabs >}} |
| 297 | + |
| 298 | + |
| 299 | +## {{% heading "whatsnext" %}} |
| 300 | + |
| 301 | +* Read about [project quotas](https://www.linux.org/docs/man8/xfs_quota.html) in XFS |
0 commit comments