You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/availability-performance/identify-memory-saturation-aks.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,6 @@ ms.date: 08/30/2024
5
5
editor: v-jsitser
6
6
ms.reviewer: chiragpa, aritraghosh, v-leedennis
7
7
ms.service: azure-kubernetes-service
8
-
#Customer intent: As an Azure Kubernetes user, I want to understand how to identify memory saturation in my Azure Kubernetes Service (AKS) clusters so that I don't experience service interruption or other memory saturation issues.
9
8
ms.custom: sap:Node/node pool availability and performance
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/connectivity/basic-troubleshooting-outbound-connections.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,7 +116,7 @@ For basic troubleshooting for egress traffic from an AKS cluster, follow these s
116
116
117
117
1.[Check whether the cluster can reach any other external endpoint](./troubleshoot-connections-endpoints-outside-virtual-network.md).
118
118
119
-
1.[Check whether a network policy is blocking the traffic](./troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
119
+
1.[Check whether a network policy is blocking the traffic](./dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
120
120
121
121
1.[Check whether an NSG is blocking the traffic](./traffic-between-node-pools-is-blocked.md).
122
122
@@ -278,7 +278,7 @@ To verify that the endpoint is reachable from the node where the problematic pod
278
278
IP4Address : 23.200.197.152
279
279
```
280
280
281
-
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider [checking DNS resolution failures from inside the pod but not from the worker node](troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md). If you want to inspect DNS resolution for an endpoint across the cluster, you can consider [checking DNS resolution status across the cluster](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md#step-3-verify-the-health-of-the-upstream-dns-servers).
281
+
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider [checking DNS resolution failures from inside the pod but not from the worker node](./dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md). If you want to inspect DNS resolution for an endpoint across the cluster, you can consider [checking DNS resolution status across the cluster](./dns/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md#step-3-verify-the-health-of-the-upstream-dns-servers).
282
282
283
283
If the DNS resolution is successful, continue to the network tests. Otherwise, verify the DNS configuration for the cluster.
#Customer intent: As an Azure Kubernetes user, I want to learn how to create a troubleshooting workflow so that I can fix DNS resolution problems in Azure Kubernetes Service (AKS).
13
13
---
14
-
# Basic troubleshooting of DNS resolution problems in AKS
14
+
# Troubleshoot DNS resolution problems in AKS
15
15
16
16
This article discusses how to create a troubleshooting workflow to fix Domain Name System (DNS) resolution problems in Microsoft Azure Kubernetes Service (AKS).
17
17
@@ -82,9 +82,9 @@ To start the process, run tests from a test pod against each layer.
82
82
spec:
83
83
containers:
84
84
- name: aks-test
85
-
image: contoso/debian-ssh
85
+
image: debian:stable
86
86
command: ["/bin/sh"]
87
-
args: ["-c", "while true; do sleep 1000; done"]
87
+
args: ["-c", "apt-get update && apt-get install -y dnsutils && while true; do sleep 1000; done"]
88
88
EOF
89
89
```
90
90
@@ -94,7 +94,7 @@ To start the process, run tests from a test pod against each layer.
94
94
kubectl get pod --namespace kube-system --selector k8s-app=kube-dns --output wide
95
95
```
96
96
97
-
1. Connect to the test pod and test the DNS resolution against each CoreDNS pod IP address by running the following commands:
97
+
1. Connect to the test pod using the `kubectl exec -it aks-test -- bash`commandand test the DNS resolution against each CoreDNS pod IP address by running the following commands:
98
98
99
99
```bash
100
100
# Placeholder values
@@ -109,6 +109,8 @@ To start the process, run tests from a test pod against each layer.
109
109
done
110
110
```
111
111
112
+
For more information about troubleshooting DNS resolution problems from the pod level, see [Troubleshoot DNS resolution failures from inside the pod](troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
113
+
112
114
##### Test the DNS resolution at CoreDNS service level
113
115
114
116
1. Retrieve the CoreDNS service IP address by running the following `kubectl get` command:
@@ -161,7 +163,52 @@ To start the process, run tests from a test pod against each layer.
161
163
162
164
Examine the DNS server configuration of the virtual network, and determine whether the servers can resolve the record in question.
163
165
164
-
#### Part 2: Review the health and performance of nodes
166
+
#### Part 2: Review the health and performance of CoreDNS pods and nodes
167
+
168
+
##### Review the health and performance of CoreDNS pods
169
+
170
+
You can use `kubectl` commands to check the health and performance of CoreDNS pods. To do so, follow these steps:
171
+
172
+
1. Verify that the CoreDNS pods are running:
173
+
174
+
```bash
175
+
kubectl get pods -l k8s-app=kube-dns -n kube-system
176
+
```
177
+
178
+
2. Check if the CoreDNS pods are overused:
179
+
180
+
```bash
181
+
kubectl top pods -n kube-system -l k8s-app=kube-dns
182
+
```
183
+
184
+
```output
185
+
NAME CPU(cores) MEMORY(bytes)
186
+
coredns-dc97c5f55-424f7 3m 23Mi
187
+
coredns-dc97c5f55-wbh4q 3m 25Mi
188
+
```
189
+
190
+
3. Get the nodes that host the CoreDNS pods:
191
+
192
+
```bash
193
+
kubectl get pods -n kube-system -l k8s-app=kube-dns -o jsonpath='{.items[*].spec.nodeName}'
194
+
```
195
+
196
+
4. Verify that the nodes aren't overused:
197
+
198
+
```bash
199
+
kubectl top nodes
200
+
```
201
+
202
+
5. Verify the logs for the CoreDNS pods:
203
+
204
+
```bash
205
+
kubectl logs -l k8s-app=kube-dns -n kube-system
206
+
```
207
+
208
+
> [!NOTE]
209
+
> To get more debugging information, enable verbose logs in CoreDNS. To do so, see [Troubleshooting CoreDNS customization in AKS](/azure/aks/coredns-custom#troubleshooting).
210
+
211
+
##### Review the health and performance of nodes
165
212
166
213
You might first notice DNS resolution performance problems as intermittent errors, such as time-outs. The main causes of this problem include resource exhaustion and I/O throttling within nodes that host the CoreDNS pods or the client pod.
167
214
@@ -213,23 +260,61 @@ Allocated resources:
213
260
214
261
To get a better picture of resource usage at the pod and node level, you can also use Container insights and other cloud-native tools in Azure. For more information, see [Monitor Kubernetes clusters using Azure services and cloud native tools](/azure/azure-monitor/containers/monitor-kubernetes).
215
262
216
-
#### Part 3: Capture DNS traffic and review DNS resolution performance
263
+
#### Part 3: Analyze DNS traffic and review DNS resolution performance
264
+
265
+
Analyzing DNS traffic can help you understand how your AKS cluster handles the DNS queries. Ideally, you should reproduce the problem on a test pod while you capture the traffic from this test pod and on each of the CoreDNS pods.
266
+
267
+
There are two main ways to analyze DNS traffic:
268
+
269
+
- Using real-time DNS analysis tools, such as [Inspektor Gadget](../../logs/capture-system-insights-from-aks.md#what-is-inspektor-gadget), to analyze the DNS traffic in real time.
270
+
- Using traffic capture tools, such as [Retina Capture](https://retina.sh/docs/Troubleshooting/capture) and [Dumpy](https://github.com/larryTheSlap/dumpy), to collect the DNS traffic and analyze it with a network packet analyzer tool, such as Wireshark.
217
271
218
-
A network traffic capture can help you understand how your AKS cluster is handling the DNS queries. Ideally, you want to reproduce the problem on a test pod while you capture the traffic from this test pod and on each of the CoreDNS pods.
272
+
Both approaches aim to understand the health and performance of DNS responses using DNS response codes, response times, and other metrics. Choose the one that fits your needs best.
219
273
220
-
Many traffic-capturing tools are available to assist this process, including the following tools:
You can use [Inspektor Gadget](../../logs/capture-system-insights-from-aks.md#what-is-inspektor-gadget) to analyze the DNS traffic in real time. To install Inspektor Gadget to your cluster, see [How to install Inspektor Gadget in an AKS cluster](../../logs/capture-system-insights-from-aks.md#how-to-install-inspektor-gadget-in-an-aks-cluster).
223
277
224
-
- [Dumpy](https://github.com/larryTheSlap/dumpy) - an open source traffic capture plug-in for Kubernetes
278
+
To trace DNS traffic across all namespaces, use the following command:
279
+
280
+
```bash
281
+
# Get the version of Gadget
282
+
GADGET_VERSION=$(kubectl gadget version | grep Server | awk '{print $3}')
283
+
# Run the trace_dns gadget
284
+
kubectl gadget run trace_dns:$GADGET_VERSION --all-namespaces --fields "src,dst,name,qr,qtype,id,rcode,latency_ns"
285
+
```
286
+
287
+
Where `--fields` is a comma-separated list of fields to be displayed. The following list describes the fields that are used in the command:
288
+
289
+
- `src`: The source of the request with Kubernetes information (`<kind>/<namespace>/<name>:<port>`).
290
+
- `dst`: The destination of the request with Kubernetes information (`<kind>/<namespace>/<name>:<port>`).
291
+
- `name`: The name of the DNS request.
292
+
- `qr`: The query/response flag.
293
+
- `qtype`: The type of the DNS request.
294
+
- `id`: The ID of the DNS request, which is used to match the request and response.
295
+
- `rcode`: The response code of the DNS request.
296
+
- `latency_ns`: The latency of the DNS request.
297
+
298
+
The command output looks like the following:
299
+
300
+
```output
301
+
SRC DST NAME QR QTYPE ID RCODE LATENCY_NS
302
+
p/default/aks-test:33141 p/kube-system/coredns-57d886c994-r2… db.contoso.com. Q A c215 0ns
303
+
p/kube-system/coredns-57d886c994-r2… 168.63.129.16:53 db.contoso.com. Q A 323c 0ns
304
+
168.63.129.16:53 p/kube-system/coredns-57d886c994-r2… db.contoso.com. R A 323c NameErr… 13.64ms
305
+
p/kube-system/coredns-57d886c994-r2… p/default/aks-test:33141 db.contoso.com. R A c215 NameErr… 0ns
306
+
p/default/aks-test:56921 p/kube-system/coredns-57d886c994-r2… db.contoso.com. Q A 6574 0ns
307
+
p/kube-system/coredns-57d886c994-r2… p/default/aks-test:56921 db.contoso.com. R A 6574 NameErr… 0ns
308
+
```
225
309
226
-
- [Inspektor Gadget](https://go.microsoft.com/fwlink/?linkid=2260072) - allows checking DNS problems in real time. For more information, see [Troubleshoot DNS failures across an AKS cluster in real time](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md).
310
+
You can use the `ID` field to identify whether a query has a response. The `RCODE` field shows you the response code of the DNS request. The `LATENCY_NS` field shows you the latency of the DNS request in nanoseconds. These fields can help you understand the health and performance of DNS responses.
311
+
For more information about real-time DNS analysis, see [Troubleshoot DNS failures across an AKS cluster in real time](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md).
227
312
228
-
In this article, we use Dumpy as an example of how to collect DNS traffic captures from each CoreDNS pod and a client DNS pod (in this case, the `aks-test` pod).
313
+
##### Capture DNS traffic
229
314
230
-
##### Network traffic capture commands
315
+
This section demonstrates how to use Dumpy to collect DNS traffic captures from each CoreDNS pod and a client DNS pod (in this case, the `aks-test` pod).
231
316
232
-
To collect the captures from the test client pod, run the following Dumpy command:
317
+
To collect the captures from the test client pod, run the following command:
233
318
234
319
```bash
235
320
kubectl dumpy capture pod aks-test -f "-i any port 53" --name dns-cap1-aks-test
@@ -511,9 +596,9 @@ Observe the results of implementing your action plan. At this point, your action
511
596
512
597
If these troubleshooting steps don't resolve the problem, repeat the troubleshooting steps as necessary.
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/connectivity/dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -261,6 +261,6 @@ We recommend that you don't combine Azure DNS with custom DNS servers in the vir
261
261
262
262
For more information, see [Name resolution that uses your own DNS server](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances#name-resolution-that-uses-your-own-dns-server).
0 commit comments