Skip to content

Commit 3e59bf1

Browse files
committed
Republish UDP packet drops article with updates
1 parent 50e2eb0 commit 3e59bf1

File tree

3 files changed

+142
-5
lines changed

3 files changed

+142
-5
lines changed

articles/aks/.openpublishing.redirection.aks.json

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,6 @@
1616
"redirect_url":"/azure/aks/generation-2-vm",
1717
"redirect_document_id":false
1818
},
19-
{
20-
"source_path_from_root":"/articles/aks/troubleshoot-udp-packet-drops.md",
21-
"redirect_url":"/troubleshoot/azure/azure-kubernetes/welcome-azure-kubernetes",
22-
"redirect_document_id":false
23-
},
2419
{
2520
"source_path_from_root": "/articles/aks/operator-best-practices-multi-region.md",
2621
"redirect_url": "/azure/aks/ha-dr-overview",

articles/aks/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -581,6 +581,8 @@
581581
href: api-server-vnet-integration.md
582582
- name: Private Endpoint
583583
href: private-clusters.md#use-a-private-endpoint-connection
584+
- name: Diagnose and solve UDP packet drops
585+
href: troubleshoot-udp-packet-drops.md
584586
- name: Storage
585587
items:
586588
- name: CSI storage drivers
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
title: Diagnose and solve UDP packet drops in Azure Kubernetes Service (AKS)
3+
description: Learn how to diagnose and solve UDP packet drops in Azure Kubernetes Service (AKS).
4+
ms.topic: how-to
5+
ms.date: 05/09/2024
6+
author: schaffererin
7+
ms.author: schaffererin
8+
ms.service: azure-kubernetes-service
9+
---
10+
11+
# Diagnose and solve UDP packet drops in Azure Kubernetes Service (AKS)
12+
13+
User Datagram Protocol (UDP) is a connectionless protocol used within managed AKS clusters. UDP packets are sent without any guarantee of delivery, reliability, or order, as they don’t establish a connection before data transfer. This means that UDP packets can be lost, duplicated, or arrive out of order at the destination because of multiple reasons.
14+
15+
This article describes how to diagnose and solve UDP packet drop issues caused by a small read buffer which could overflow in cases where you have high network traffic.
16+
17+
## Prerequisites
18+
19+
* An AKS cluster with at least one node pool and one pod running a UDP-based application.
20+
* Azure CLI installed and configured. For more information, see [Install the Azure CLI](/cli/azure/install-azure-cli).
21+
* Kubectl installed and configured to connect to your AKS cluster. For more information, see [Install kubectl](/cli/azure/install-azure-cli).
22+
* A client machine that can send and receive UDP packets to and from your AKS cluster.
23+
24+
## Issue: UDP connections have a high packet drop rate
25+
26+
One possible cause of UDP packet loss is that the UDP buffer size is too small to handle the incoming traffic. The UDP buffer size determines how much data can be stored in the kernel before it's processed by the application. If the buffer size is insufficient, the kernel might drop packets that exceed the buffer capacity. This setting is managed at the virtual machine (VM) level for your nodes, and the default value is set to *212992 bytes* or *0.2 MB*.
27+
28+
There are two different variables at the VM level that apply to buffer sizes:
29+
30+
* `net.core.rmem_max = 212992 bytes`: The largest buffer value a socket owner can explicitly set.
31+
* `net.core.rmem_default = 212992 bytes`: The maximum the system can grow the buffer to if a `rmem_max` value isn't explicitly set.
32+
33+
To allow the buffer to grow to serve high bursts of traffic, we need to update the buffer size values.
34+
35+
> [!NOTE]
36+
> This article focuses on Ubuntu Linux kernel buffer sizes. If you want to see other configurations for Linux and Windows, see [Customize node configuration for AKS node pools](./custom-node-configuration.md).
37+
38+
## Diagnose
39+
40+
### Check current UDP buffer settings
41+
42+
1. Get a list of your nodes using the `kubectl get nodes` command and pick a node you want to check the buffer settings for.
43+
44+
```bash
45+
kubectl get nodes
46+
```
47+
48+
2. Set up a debug pod on the node you selected using the `kubectl debug` command. Replace `<node-name>` with the name of the node you want to debug.
49+
50+
```bash
51+
kubectl debug -it node/<node-name> --image=ubuntu --share-processes -- bash
52+
```
53+
54+
3. Get the value of the `net.core.rmem_max` and `net.core.rmem_default` variables using the following `sysctl` command:
55+
56+
```bash
57+
sysctl net.core.rmem_max net.core.rmem_default
58+
```
59+
60+
### Measure incoming UDP traffic
61+
62+
To check if your buffer is too small for your application and is dropping packets, start by simulating realistic network traffic on your pods and setting up a debug pod to monitor the incoming traffic. Then, you can use the following commands to measure the incoming UDP traffic.
63+
64+
1. Check the UDP file while the test is running using the following `cat` command:
65+
66+
```bash
67+
cat /proc/net/udp
68+
```
69+
70+
This file shows you the statistics of the current open connections under the `rx_queue` column. It doesn't show historical data.
71+
72+
2. Check the snmp file and compare the `RcvbufErrors` value before and after the test using the following `cat` command:
73+
74+
```bash
75+
cat /proc/net/snmp
76+
```
77+
78+
This file shows you the life to date of the UDP packets, including how many packets were dropped under the `RcvbufErrors` column.
79+
80+
If you notice an increase beyond your buffer size in the `rx_queue` or an uptick in the `RcvbufErrors` value, you need to increase your buffer size.
81+
82+
> [!NOTE]
83+
> Increasing the system buffer size might not be effective if your application can't keep up with the incoming packet rate. Increasing the system buffer size in this case would merely delay packet drop. You should consider examining and improving your application for how it processes the UDP packets in such situations. A larger buffer size is only useful if you have occasional spikes of traffic that sometimes fill up the buffer, as it provides the kernel with more time/resources to deal with the surge in requests.
84+
85+
## Mitigate
86+
87+
> [!NOTE]
88+
> The kernel dynamically allocates the read buffers for each socket when packets arrive, rather than allocating them in advance. The `rmem_default` and `rmem_max` settings specify the kernel buffer boundaries for each socket before packet loss occurs.
89+
90+
You can change buffer size values on a node pool level during the node pool creation process. The steps in this section show you how to configure your Linux OS and apply the changes to all nodes in the node pool. You can't add this setting to an existing node pool.
91+
92+
1. Create a `linuxosconfig.json` file on your local machine with the following contents. You can modify the values per your application requirements and node SKU. The minimum value is *212992 bytes*, and the maximum is *134217728 bytes*.
93+
94+
```json
95+
{
96+
"sysctls": {
97+
"netCoreRmemMax": 1048576,
98+
“netCoreRmemDefault”:1048576
99+
}
100+
}
101+
```
102+
103+
2. Make sure you're in the same directory as the `linuxosconfig.json` file and create a new node pool with the buffer size configuration using the [`az aks nodepool add`][az-aks-nodepool-add] command.
104+
105+
```azurecli-interactive
106+
az aks nodepool add --resource-group $RESOURCE_GROUP --cluster-name $CLUSTER_NAME --name $NODE_POOL_NAME --linux-os-config ./linuxosconfig.json
107+
```
108+
109+
This command sets the maximum UDP buffer size to *8MB* for each socket on the node. You can adjust these values in the `linuxosconfig.json` file based on your application requirements.
110+
111+
## Validate
112+
113+
Once you apply the new values, you can access your VM to ensure the new values are set as default.
114+
115+
1. Get a list of your nodes using the `kubectl get nodes` command and pick a node you want to check the buffer settings for.
116+
117+
```bash
118+
kubectl get nodes
119+
```
120+
121+
2. Set up a debug pod on the node you selected using the `kubectl debug` command. Replace `<node-name>` with the name of the node you want to debug.
122+
123+
```bash
124+
kubectl debug -it node/<node-name> --image=ubuntu --share-processes -- bash
125+
```
126+
127+
3. Get the value of the `net.core.rmem_max` variable using the following `sysctl` command:
128+
129+
```bash
130+
sysctl net.core.rmem_max net.core.rmem_default
131+
```
132+
133+
Your values should now be set to the values outlined in `linuxosconfig.json`.
134+
135+
## Next steps
136+
137+
In this article, you learned how to diagnose and solve UDP packet drops in Azure Kubernetes Service (AKS). For more information on how to troubleshoot issues in AKS, see the [Azure Kubernetes Service troubleshooting documentation](/troubleshoot/azure/azure-kubernetes/welcome-azure-kubernetes).
138+
139+
<!-- LINKS -->
140+
[az-aks-nodepool-add]: /cli/azure/aks/nodepool#az-aks-nodepool-

0 commit comments

Comments
 (0)