Skip to content

Commit 6f38ae1

Browse files
authored
AB#5420: Private version fo PR#1852
1 parent b677d30 commit 6f38ae1

File tree

2 files changed

+175
-0
lines changed

2 files changed

+175
-0
lines changed
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
title: Troubleshoot High Memory Consumption in Disk-Intensive Applications
3+
description: Helps identify and resolve excessive memory usage due to Linux kernel behaviors on Kubernetes pods.
4+
ms.date: 04/16/2025
5+
ms.reviewer: claudiogodoy,
6+
ms.service: azure-kubernetes-service
7+
ms.custom: sap:Node/node pool availability and performance
8+
---
9+
# Troubleshoot high memory consumption in disk-intensive applications
10+
11+
Disk input and output operations are costly, and most operating systems implement caching strategies for reading and writing data to the filesystem. [Linux kernel](https://www.kernel.org/doc) usually uses strategies such as the [page cache](https://www.kernel.org/doc/gorman/html/understand/understand013.html) to improve the overall performance. The primary goal of the page cache is to store data that's read from the filesystem in cache, making it available in memory for future read operations.
12+
13+
When disk-intensive applications perform frequent filesystem operations, high memory consumption might occur. This article helps you to identity and resolve this issue due to Linux kernel behaviors on Kubernetes pods.
14+
15+
## Prerequisites
16+
17+
- A tool to connect to the Kubernetes cluster, such as the kubectl tool. To install kubectl using the [Azure CLI](/cli/azure/install-azure-cli), run the [az aks install-cli](/cli/azure/aks#az-aks-install-cli) command.
18+
19+
## Symptoms
20+
21+
The following table outlines the common symptoms of memory saturation:
22+
23+
| Symptom | Description |
24+
| --- | --- |
25+
| [Working set](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#memory) metric too high | This issue occurs when there is a significant difference between the working_set metric reported by the [Kubernetes Metrics API](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server) and the actual memory consumed by an application. |
26+
| Out-of-memory (OOM) kill | This issue indicates memory issues exist on your pod. |
27+
28+
## Troubleshooting checklist
29+
30+
### Step 1: Inspect pod working set
31+
32+
1. Identify which pod is consuming excessive memory by following the guide[Troubleshoot memory saturation in AKS clusters](identify-memory-saturation-aks.md).
33+
2. Use the following [kubectl top pods](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_top/) command to show the actual [Working_Set](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#memory) reported by the [Kubernetes metrics API](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server):
34+
35+
```console
36+
$ kubectl top pods -A | grep -i "<DEPLOYMENT_NAME>"
37+
NAME CPU(cores) MEMORY(bytes)
38+
my-deployment-fc94b7f98-m9z2l 1m 344Mi
39+
```
40+
41+
### Step 2: Inspect pod memory statistics
42+
43+
Inspect the memory statistics of the [cgroup](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html) of the pod by following these steps:
44+
45+
1. Connect to the pod:
46+
47+
```console
48+
$ kubectl exec <POD_NAME> -it -- bash
49+
```
50+
51+
2. Navigate to the cgroup statistics directory and list memory-related files:
52+
53+
```console
54+
$ ls /sys/fs/cgroup | grep -e memory.stat -e memory.current
55+
memory.current memory.stat
56+
```
57+
58+
- `memory.current`: Total memory currently used by the cgroup and its descendants.
59+
- `memory.stat`: This breaks down the cgroup's memory footprint into different types of memory, type-specific details, and other information on the state and past events of the memory management system.
60+
61+
3. All the values listed on those files are in bytes. Get an overview of how the memory consumption is distributed on the `pod`:
62+
63+
```console
64+
$ cat /sys/fs/cgroup/memory.current
65+
10645012480
66+
$ cat /sys/fs/cgroup/memory.stat
67+
anon 5197824
68+
inactive_anon 5152768
69+
active_anon 8192
70+
...
71+
file 10256240640
72+
active_file 32768
73+
inactive_file 10256207872
74+
...
75+
slab 354682456
76+
slab_reclaimable 354554400
77+
slab_unreclaimable 128056
78+
...
79+
```
80+
81+
`cAdvisor` uses `memory.current` and `inactive_file` to compute the working set metric. You can replicate the calculation using the following formula:
82+
83+
```sh
84+
working_set = (memory.current - inactive_file) / 1048576
85+
= (10645012480 - 10256207872) / 1048576
86+
= 370 MB
87+
```
88+
89+
### Step 3: Determine kernel vs. application memory consumption
90+
91+
The following table describes some memory segments:
92+
93+
| Segment | Description |
94+
|---|---|
95+
| anon | Amount of memory used in anonymous mappings. The majority languages use this segment to allocate memory. |
96+
| file | Amount of memory used to cache filesystem data, including tmpfs and shared memory. |
97+
| slab | Amount of memory used for storing in-kernel data structures. |
98+
99+
The majority of languages use the anon memory segment to allocate resources. In this case, the `anon` represents 5197824 bytes which is not even close to the total amount reported by the working set metric.
100+
101+
On the other hand, there is one of the segments that Kernel uses the `slab` representing 354682456 bytes, which is almost all the memory reported by working set metric on this pod.
102+
103+
### Step 4: Run a node drop cache
104+
105+
> [!NOTE]
106+
> This step might lead to availability and performance issues. Avoid running it in a production environment.
107+
108+
1. Get the node running the pod:
109+
110+
```console
111+
$ kubectl get pod -A -o wide | grep "<POD_NAME>"
112+
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
113+
my-deployment-fc94b7f98-m9z2l 1/1 Running 0 37m 10.244.1.17 aks-agentpool-26052128-vmss000004 <none> <none>
114+
```
115+
116+
2. Create a debugger pod using the [kubectl debug](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_debug/) command and create a `kubectl` session:
117+
118+
```console
119+
$ kubectl debug node/<NODE_NAME> -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0
120+
$ chroot /host
121+
```
122+
123+
3. Drop the kernel cache:
124+
125+
```console
126+
echo 1 > /proc/sys/vm/drop_caches
127+
```
128+
129+
4. Verify if the command in the previous step causes the effect by repeating [Step 1](#step-1-inspect-pod-working-set) and [Step 2](#step-2-inspect-pod-memory-statistics):
130+
131+
```console
132+
$ kubectl top pods -A | grep -i "<DEPLOYMENT_NAME>"
133+
NAME CPU(cores) MEMORY(bytes)
134+
my-deployment-fc94b7f98-m9z2l 1m 4Mi
135+
136+
$ kubectl exec <POD_NAME> -it -- cat /sys/fs/cgroup/memory.stat
137+
anon 4632576
138+
file 1781760
139+
...
140+
slab_reclaimable 219312
141+
slab_unreclaimable 173456
142+
slab 392768
143+
```
144+
145+
If you observe a significant decrease in both working set and slab memory segment, you are experiencing the issue where a great amount of pod's memory is used by the Kernel.
146+
147+
## Workaround: Set appropriate memory limits and requests
148+
149+
The only effective workaround for high memory consumption on Kubernetes pods is to set realistic resource [limits and requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits). For example:
150+
151+
```ymal
152+
resources:
153+
requests:
154+
memory: 30Mi
155+
limits:
156+
memory: 60Mi
157+
```
158+
159+
By configuring appropriate memory limits and requests in the Kubernetes or specification, you can ensure that Kubernetes manages memory allocation more efficiently, mitigating the impact of excessive kernel-level caching on pod memory usage.
160+
161+
> [!NOTE]
162+
> Misconfigured pod memory limits can lead to problems such as OOM-Killed errors.
163+
164+
## References
165+
166+
- [Learn more about Azure Kubernetes Service (AKS) best practices](/azure/aks/best-practices)
167+
- [Monitor your Kubernetes cluster performance with Container insights](/azure/azure-monitor/containers/container-insights-analyze)
168+
169+
[!INCLUDE [Third-party information disclaimer](../../../includes/third-party-disclaimer.md)]
170+
171+
[!INCLUDE [Third-party contact information disclaimer](../../../includes/third-party-contact-disclaimer.md)]
172+
173+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

support/azure/azure-kubernetes/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@
168168
href: availability-performance/identify-high-cpu-consuming-containers-aks.md
169169
- name: Identify memory saturation in AKS clusters
170170
href: availability-performance/identify-memory-saturation-aks.md
171+
- name: Troubleshoot high memory consumption in disk-intensive applications
172+
href: availability-performance/high-memory-consumption-disk-intensive-applications.md
171173
- name: Troubleshoot cluster service health probe mode issues
172174
href: availability-performance/cluster-service-health-probe-mode-issues.md
173175
- name: Troubleshoot node not ready

0 commit comments

Comments
 (0)