Skip to content

Commit 3cff9bd

Browse files
author
Simonx Xu
authored
Merge pull request #8737 from AmandaAZ/Branch-CI5420
AB#5420: Private version of PR#1852
2 parents 8f44580 + 03252c4 commit 3cff9bd

File tree

2 files changed

+179
-0
lines changed

2 files changed

+179
-0
lines changed
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
title: Troubleshoot High Memory Consumption in Disk-Intensive Applications
3+
description: Helps identify and resolve excessive memory usage due to Linux kernel behaviors on Kubernetes pods.
4+
ms.date: 04/30/2025
5+
ms.reviewer: claudiogodoy, v-weizhu
6+
ms.service: azure-kubernetes-service
7+
ms.custom: sap:Node/node pool availability and performance
8+
---
9+
# Troubleshoot high memory consumption in disk-intensive applications
10+
11+
Disk input and output operations are costly, and most operating systems implement caching strategies for reading and writing data to the filesystem. The [Linux kernel](https://www.kernel.org/doc) usually uses strategies such as the [page cache](https://www.kernel.org/doc/gorman/html/understand/understand013.html) to improve overall performance. The primary goal of the page cache is to store data read from the filesystem in the cache, making it available in memory for future read operations.
12+
13+
This article helps you identity and avoid high memory consumption in disk-intensive applications due to Linux kernel behaviors on Kubernetes pods.
14+
15+
## Prerequisites
16+
17+
A tool to connect to the Kubernetes cluster, such as the `kubectl` tool. To install `kubectl` using the [Azure CLI](/cli/azure/install-azure-cli), run the [az aks install-cli](/cli/azure/aks#az-aks-install-cli) command.
18+
19+
## Symptoms
20+
21+
When a disk-intensive application running on a pod performs frequent filesystem operations, high memory consumption might occur.
22+
23+
The following table outlines common symptoms of high memory consumption:
24+
25+
| Symptom | Description |
26+
| --- | --- |
27+
| The [working set](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#memory) metric is too high. | This issue occurs when there's a significant difference between the [working set](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#memory) metric reported by the [Kubernetes Metrics API](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server) and the actual memory consumed by an application. |
28+
| Out-of-memory (OOM) kill. | This issue indicates memory issues exist on your pod. |
29+
| Increased memory usage after heavy disk activity. | After operations such as backups, large file reads/writes, or data imports, memory consumption rises. |
30+
| Memory usage grows indefinitely. | The pod's memory consumption increases over time without reducing, like a memory leak, even if the application itself isn't leaking memory.|
31+
32+
## Troubleshooting checklist
33+
34+
### Step 1: Inspect the pod working set
35+
36+
To inspect the working set of pods reported by the Kubernetes Metrics API, run the following [kubectl top pods](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_top/) command:
37+
38+
```console
39+
$ kubectl top pods -A | grep -i "<DEPLOYMENT_NAME>"
40+
NAME CPU(cores) MEMORY(bytes)
41+
my-deployment-fc94b7f98-m9z2l 1m 344Mi
42+
```
43+
44+
For detailed steps about how to identify which pod is consuming excessive memory, see [Troubleshoot memory saturation in AKS clusters](identify-memory-saturation-aks.md#step-1-identify-nodes-that-have-memory-saturation).
45+
46+
### Step 2: Inspect pod memory statistics
47+
48+
To inspect the memory statistics of the [cgroups](https://kubernetes.io/docs/concepts/architecture/cgroups/) on the pod that's consuming excessive memory, follow these steps:
49+
50+
> [!NOTE]
51+
> [Cgroups](https://kubernetes.io/docs/concepts/architecture/cgroups/) help enforce resource management for pods and containers, including CPU/memory requests and limits for containerized workloads.
52+
53+
1. Connect to the pod:
54+
55+
```console
56+
$ kubectl exec <POD_NAME> -it -- bash
57+
```
58+
59+
2. Navigate to the `cgroup` statistics directory and list the memory-related files:
60+
61+
```console
62+
$ ls /sys/fs/cgroup | grep -e memory.stat -e memory.current
63+
memory.current memory.stat
64+
```
65+
66+
- `memory.current`: Total memory currently used by the `cgroup` and its descendants.
67+
- `memory.stat`: This breaks down the cgroup's memory footprint into different types of memory, type-specific details, and other information about the state and past events of the memory management system.
68+
69+
All the values listed in those files are in bytes.
70+
71+
3. Get an overview of how memory consumption is distributed on the pod:
72+
73+
```console
74+
$ cat /sys/fs/cgroup/memory.current
75+
10645012480
76+
$ cat /sys/fs/cgroup/memory.stat
77+
anon 5197824
78+
inactive_anon 5152768
79+
active_anon 8192
80+
...
81+
file 10256240640
82+
active_file 32768
83+
inactive_file 10256207872
84+
...
85+
slab 354682456
86+
slab_reclaimable 354554400
87+
slab_unreclaimable 128056
88+
...
89+
```
90+
91+
`cAdvisor` uses `memory.current` and `inactive_file` to compute the working set metric. You can replicate the calculation using the following formula:
92+
93+
`working_set = (memory.current - inactive_file) / 1048576 = (10645012480 - 10256207872) / 1048576 = 370 MB`
94+
95+
### Step 3: Determine kernel and application memory consumption
96+
97+
The following table describes some memory segments:
98+
99+
| Segment | Description |
100+
|---|---|
101+
| `anon` | The amount of memory used in anonymous mappings. Most languages use this segment to allocate memory. |
102+
| `file` | The amount of memory used to cache filesystem data, including tmpfs and shared memory. |
103+
| `slab` | The amount of memory used to store data structures in the Linux kernel. |
104+
105+
Combined with [Step 2](#step-2-inspect-pod-memory-statistics), `anon` represents 5,197,824 bytes, which isn't close to the total amount reported by the working set metric. The `slab` memory segment used by the Linux kernel represents 354,682,456 bytes, which is almost all the memory reported by the working set metric on the pod.
106+
107+
### Step 4: Drop the kernel cache on a debugger pod
108+
109+
> [!NOTE]
110+
> This step might lead to availability and performance issues. Avoid running it in a production environment.
111+
112+
1. Get the node running the pod:
113+
114+
```console
115+
$ kubectl get pod -A -o wide | grep "<POD_NAME>"
116+
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
117+
my-deployment-fc94b7f98-m9z2l 1/1 Running 0 37m 10.244.1.17 aks-agentpool-26052128-vmss000004 <none> <none>
118+
```
119+
120+
2. Create a debugger pod using the [kubectl debug](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_debug/) command and create a `kubectl` session:
121+
122+
```console
123+
$ kubectl debug node/<NODE_NAME> -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0
124+
$ chroot /host
125+
```
126+
127+
3. Drop the kernel cache:
128+
129+
```console
130+
echo 1 > /proc/sys/vm/drop_caches
131+
```
132+
133+
4. Verify if the command in the previous step causes any effect by repeating [Step 1](#step-1-inspect-the-pod-working-set) and [Step 2](#step-2-inspect-pod-memory-statistics):
134+
135+
```console
136+
$ kubectl top pods -A | grep -i "<DEPLOYMENT_NAME>"
137+
NAME CPU(cores) MEMORY(bytes)
138+
my-deployment-fc94b7f98-m9z2l 1m 4Mi
139+
140+
$ kubectl exec <POD_NAME> -it -- cat /sys/fs/cgroup/memory.stat
141+
anon 4632576
142+
file 1781760
143+
...
144+
slab_reclaimable 219312
145+
slab_unreclaimable 173456
146+
slab 392768
147+
```
148+
149+
If you observe a significant decrease in both the working set and the `slab` memory segment, you're experiencing an issue with the Linux kernel using a great amount of memory on the pod.
150+
151+
## Workaround: Configure appropriate memory limits and requests
152+
153+
The only effective workaround for high memory consumption on Kubernetes pods is to set realistic resource [limits and requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits). For example:
154+
155+
```yaml
156+
resources:
157+
requests:
158+
memory: 30Mi
159+
limits:
160+
memory: 60Mi
161+
```
162+
163+
By configuring appropriate memory limits and requests in Kubernetes or the specification, you can ensure that Kubernetes manages memory allocation more efficiently, mitigating the impact of excessive kernel-level caching on pod memory usage.
164+
165+
> [!CAUTION]
166+
> Misconfigured pod memory limits can lead to problems such as OOMKilled errors.
167+
168+
## References
169+
170+
- [Learn more about Azure Kubernetes Service (AKS) best practices](/azure/aks/best-practices)
171+
- [Monitor your Kubernetes cluster performance with Container insights](/azure/azure-monitor/containers/container-insights-analyze)
172+
173+
[!INCLUDE [Third-party information disclaimer](../../../includes/third-party-disclaimer.md)]
174+
175+
[!INCLUDE [Third-party contact information disclaimer](../../../includes/third-party-contact-disclaimer.md)]
176+
177+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

support/azure/azure-kubernetes/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@
168168
href: availability-performance/identify-high-cpu-consuming-containers-aks.md
169169
- name: Identify memory saturation in AKS clusters
170170
href: availability-performance/identify-memory-saturation-aks.md
171+
- name: Troubleshoot high memory consumption due to Linux kernel behaviors
172+
href: availability-performance/high-memory-consumption-disk-intensive-applications.md
171173
- name: Troubleshoot cluster service health probe mode issues
172174
href: availability-performance/cluster-service-health-probe-mode-issues.md
173175
- name: Troubleshoot node not ready

0 commit comments

Comments
 (0)