Skip to content

Commit e980e6d

Browse files
authored
Merge branch 'MicrosoftDocs:main' into CI_5811
2 parents 674df63 + 8d485ad commit e980e6d

File tree

129 files changed

+2502
-981
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

129 files changed

+2502
-981
lines changed

.openpublishing.redirection.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13745,6 +13745,18 @@
1374513745
{
1374613746
"source_path": "support/dynamics-365/commerce/ecommerce-storefront/pickup-store-link-missing.md",
1374713747
"redirect_url": "/previous-versions/troubleshoot/dynamics-365/commerce/ecommerce-storefront/pickup-store-link-missing"
13748+
},
13749+
{
13750+
"source_path": "support/azure/azure-kubernetes/connectivity/basic-troubleshooting-dns-resolution-problems.md",
13751+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/connectivity/dns/basic-troubleshooting-dns-resolution-problems"
13752+
},
13753+
{
13754+
"source_path": "support/azure/azure-kubernetes/connectivity/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md",
13755+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/connectivity/dns/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time"
13756+
},
13757+
{
13758+
"source_path": "support/azure/azure-kubernetes/connectivity/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md",
13759+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/connectivity/dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node"
1374813760
}
1374913761
]
1375013762
}

support/azure/.openpublishing.redirection.azure.json

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6320,6 +6320,40 @@
63206320
{
63216321
"source_path": "azure-kubernetes/error-codes/zonalallocation-allocatonfailed-error.md",
63226322
"redirect_url": "/troubleshoot/azure/azure-kubernetes/error-codes/zonalallocation-allocationfailed-error"
6323+
},
6324+
{
6325+
6326+
"source_path": "kubernetes-fleet/troubleshoot-clusterresourceplacement-api-issues.md",
6327+
"redirect_url": "/troubleshoot/azure/kubernetes-fleet/cluster-resource-placement/troubleshoot-clusterresourceplacement-api-issues"
6328+
},
6329+
{
6330+
"source_path": "kubernetes-fleet/crp-clusterresourceplacementscheduled-false.md",
6331+
"redirect_url": "/troubleshoot/azure/kubernetes-fleet/cluster-resource-placement/crp-clusterresourceplacementscheduled-false"
6332+
},
6333+
{
6334+
"source_path": "kubernetes-fleet/crp-clusterresourceplacementrolloutstarted-false.md",
6335+
"redirect_url": "/troubleshoot/azure/kubernetes-fleet/cluster-resource-placement/crp-clusterresourceplacementrolloutstarted-false"
6336+
},
6337+
{
6338+
"source_path": "kubernetes-fleet/crp-clusterresourceplacementoverridden-false.md",
6339+
"redirect_url": "/troubleshoot/azure/kubernetes-fleet/cluster-resource-placement/crp-clusterresourceplacementoverridden-false"
6340+
},
6341+
{
6342+
"source_path": "kubernetes-fleet/crp-clusterresourceplacementworksynchronized-false.md",
6343+
"redirect_url": "/troubleshoot/azure/kubernetes-fleet/cluster-resource-placement/crp-clusterresourceplacementworksynchronized-false"
6344+
},
6345+
{
6346+
"source_path": "kubernetes-fleet/crp-clusterresourceplacementapplied-false.md",
6347+
"redirect_url": "/troubleshoot/azure/kubernetes-fleet/cluster-resource-placement/crp-clusterresourceplacementapplied-false"
6348+
},
6349+
{
6350+
"source_path": "kubernetes-fleet/crp-clusterresourceplacementavailable-false.md",
6351+
"redirect_url": "/troubleshoot/azure/kubernetes-fleet/cluster-resource-placement/crp-clusterresourceplacementavailable-false"
6352+
},
6353+
{
6354+
"source_path": "hpc/batch/error-accountencryptionkeyunavailable.md",
6355+
"redirect_url": "/troubleshoot/azure/hpc/batch/welcome-hpc-batch"
6356+
63236357
}
63246358
]
63256359
}

support/azure/azure-kubernetes/availability-performance/identify-memory-saturation-aks.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ ms.date: 08/30/2024
55
editor: v-jsitser
66
ms.reviewer: chiragpa, aritraghosh, v-leedennis
77
ms.service: azure-kubernetes-service
8-
#Customer intent: As an Azure Kubernetes user, I want to understand how to identify memory saturation in my Azure Kubernetes Service (AKS) clusters so that I don't experience service interruption or other memory saturation issues.
98
ms.custom: sap:Node/node pool availability and performance
109
---
1110
# Troubleshoot memory saturation in AKS clusters

support/azure/azure-kubernetes/connectivity/basic-troubleshooting-outbound-connections.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ For basic troubleshooting for egress traffic from an AKS cluster, follow these s
116116

117117
1. [Check whether the cluster can reach any other external endpoint](./troubleshoot-connections-endpoints-outside-virtual-network.md).
118118

119-
1. [Check whether a network policy is blocking the traffic](./troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
119+
1. [Check whether a network policy is blocking the traffic](./dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
120120

121121
1. [Check whether an NSG is blocking the traffic](./traffic-between-node-pools-is-blocked.md).
122122

@@ -278,7 +278,7 @@ To verify that the endpoint is reachable from the node where the problematic pod
278278
IP4Address : 23.200.197.152
279279
```
280280

281-
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider [checking DNS resolution failures from inside the pod but not from the worker node](troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md). If you want to inspect DNS resolution for an endpoint across the cluster, you can consider [checking DNS resolution status across the cluster](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md#step-3-verify-the-health-of-the-upstream-dns-servers).
281+
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider [checking DNS resolution failures from inside the pod but not from the worker node](./dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md). If you want to inspect DNS resolution for an endpoint across the cluster, you can consider [checking DNS resolution status across the cluster](./dns/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md#step-3-verify-the-health-of-the-upstream-dns-servers).
282282

283283
If the DNS resolution is successful, continue to the network tests. Otherwise, verify the DNS configuration for the cluster.
284284

support/azure/azure-kubernetes/connectivity/basic-troubleshooting-dns-resolution-problems.md renamed to support/azure/azure-kubernetes/connectivity/dns/basic-troubleshooting-dns-resolution-problems.md

Lines changed: 104 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@ title: Basic troubleshooting of DNS resolution problems in AKS
33
description: Learn how to create a troubleshooting workflow to fix DNS resolution problems in Azure Kubernetes Service (AKS).
44
author: sturrent
55
ms.author: seturren
6-
ms.date: 08/09/2024
7-
ms.reviewer: v-rekhanain, v-leedennis, josebl, v-weizhu
6+
ms.date: 05/29/2025
7+
ms.reviewer: v-rekhanain, v-leedennis, josebl, v-weizhu, qasimsarfraz
88
editor: v-jsitser
99
ms.service: azure-kubernetes-service
1010
ms.custom: sap:Connectivity
1111
ms.topic: troubleshooting-general
1212
#Customer intent: As an Azure Kubernetes user, I want to learn how to create a troubleshooting workflow so that I can fix DNS resolution problems in Azure Kubernetes Service (AKS).
1313
---
14-
# Basic troubleshooting of DNS resolution problems in AKS
14+
# Troubleshoot DNS resolution problems in AKS
1515

1616
This article discusses how to create a troubleshooting workflow to fix Domain Name System (DNS) resolution problems in Microsoft Azure Kubernetes Service (AKS).
1717

@@ -82,9 +82,9 @@ To start the process, run tests from a test pod against each layer.
8282
spec:
8383
containers:
8484
- name: aks-test
85-
image: contoso/debian-ssh
85+
image: debian:stable
8686
command: ["/bin/sh"]
87-
args: ["-c", "while true; do sleep 1000; done"]
87+
args: ["-c", "apt-get update && apt-get install -y dnsutils && while true; do sleep 1000; done"]
8888
EOF
8989
```
9090
@@ -94,7 +94,7 @@ To start the process, run tests from a test pod against each layer.
9494
kubectl get pod --namespace kube-system --selector k8s-app=kube-dns --output wide
9595
```
9696
97-
1. Connect to the test pod and test the DNS resolution against each CoreDNS pod IP address by running the following commands:
97+
1. Connect to the test pod using the `kubectl exec -it aks-test -- bash` command and test the DNS resolution against each CoreDNS pod IP address by running the following commands:
9898
9999
```bash
100100
# Placeholder values
@@ -109,6 +109,8 @@ To start the process, run tests from a test pod against each layer.
109109
done
110110
```
111111
112+
For more information about troubleshooting DNS resolution problems from the pod level, see [Troubleshoot DNS resolution failures from inside the pod](troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
113+
112114
##### Test the DNS resolution at CoreDNS service level
113115
114116
1. Retrieve the CoreDNS service IP address by running the following `kubectl get` command:
@@ -161,7 +163,52 @@ To start the process, run tests from a test pod against each layer.
161163
162164
Examine the DNS server configuration of the virtual network, and determine whether the servers can resolve the record in question.
163165
164-
#### Part 2: Review the health and performance of nodes
166+
#### Part 2: Review the health and performance of CoreDNS pods and nodes
167+
168+
##### Review the health and performance of CoreDNS pods
169+
170+
You can use `kubectl` commands to check the health and performance of CoreDNS pods. To do so, follow these steps:
171+
172+
1. Verify that the CoreDNS pods are running:
173+
174+
```bash
175+
kubectl get pods -l k8s-app=kube-dns -n kube-system
176+
```
177+
178+
2. Check if the CoreDNS pods are overused:
179+
180+
```bash
181+
kubectl top pods -n kube-system -l k8s-app=kube-dns
182+
```
183+
184+
```output
185+
NAME CPU(cores) MEMORY(bytes)
186+
coredns-dc97c5f55-424f7 3m 23Mi
187+
coredns-dc97c5f55-wbh4q 3m 25Mi
188+
```
189+
190+
3. Get the nodes that host the CoreDNS pods:
191+
192+
```bash
193+
kubectl get pods -n kube-system -l k8s-app=kube-dns -o jsonpath='{.items[*].spec.nodeName}'
194+
```
195+
196+
4. Verify that the nodes aren't overused:
197+
198+
```bash
199+
kubectl top nodes
200+
```
201+
202+
5. Verify the logs for the CoreDNS pods:
203+
204+
```bash
205+
kubectl logs -l k8s-app=kube-dns -n kube-system
206+
```
207+
208+
> [!NOTE]
209+
> To get more debugging information, enable verbose logs in CoreDNS. To do so, see [Troubleshooting CoreDNS customization in AKS](/azure/aks/coredns-custom#troubleshooting).
210+
211+
##### Review the health and performance of nodes
165212
166213
You might first notice DNS resolution performance problems as intermittent errors, such as time-outs. The main causes of this problem include resource exhaustion and I/O throttling within nodes that host the CoreDNS pods or the client pod.
167214
@@ -213,23 +260,61 @@ Allocated resources:
213260
214261
To get a better picture of resource usage at the pod and node level, you can also use Container insights and other cloud-native tools in Azure. For more information, see [Monitor Kubernetes clusters using Azure services and cloud native tools](/azure/azure-monitor/containers/monitor-kubernetes).
215262
216-
#### Part 3: Capture DNS traffic and review DNS resolution performance
263+
#### Part 3: Analyze DNS traffic and review DNS resolution performance
264+
265+
Analyzing DNS traffic can help you understand how your AKS cluster handles the DNS queries. Ideally, you should reproduce the problem on a test pod while you capture the traffic from this test pod and on each of the CoreDNS pods.
266+
267+
There are two main ways to analyze DNS traffic:
268+
269+
- Using real-time DNS analysis tools, such as [Inspektor Gadget](../../logs/capture-system-insights-from-aks.md#what-is-inspektor-gadget), to analyze the DNS traffic in real time.
270+
- Using traffic capture tools, such as [Retina Capture](https://retina.sh/docs/Troubleshooting/capture) and [Dumpy](https://github.com/larryTheSlap/dumpy), to collect the DNS traffic and analyze it with a network packet analyzer tool, such as Wireshark.
217271
218-
A network traffic capture can help you understand how your AKS cluster is handling the DNS queries. Ideally, you want to reproduce the problem on a test pod while you capture the traffic from this test pod and on each of the CoreDNS pods.
272+
Both approaches aim to understand the health and performance of DNS responses using DNS response codes, response times, and other metrics. Choose the one that fits your needs best.
219273
220-
Many traffic-capturing tools are available to assist this process, including the following tools:
274+
##### Real-time DNS traffic analysis
221275
222-
- [Retina Capture](https://retina.sh/docs/Troubleshooting/capture)
276+
You can use [Inspektor Gadget](../../logs/capture-system-insights-from-aks.md#what-is-inspektor-gadget) to analyze the DNS traffic in real time. To install Inspektor Gadget to your cluster, see [How to install Inspektor Gadget in an AKS cluster](../../logs/capture-system-insights-from-aks.md#how-to-install-inspektor-gadget-in-an-aks-cluster).
223277
224-
- [Dumpy](https://github.com/larryTheSlap/dumpy) - an open source traffic capture plug-in for Kubernetes
278+
To trace DNS traffic across all namespaces, use the following command:
279+
280+
```bash
281+
# Get the version of Gadget
282+
GADGET_VERSION=$(kubectl gadget version | grep Server | awk '{print $3}')
283+
# Run the trace_dns gadget
284+
kubectl gadget run trace_dns:$GADGET_VERSION --all-namespaces --fields "src,dst,name,qr,qtype,id,rcode,latency_ns"
285+
```
286+
287+
Where `--fields` is a comma-separated list of fields to be displayed. The following list describes the fields that are used in the command:
288+
289+
- `src`: The source of the request with Kubernetes information (`<kind>/<namespace>/<name>:<port>`).
290+
- `dst`: The destination of the request with Kubernetes information (`<kind>/<namespace>/<name>:<port>`).
291+
- `name`: The name of the DNS request.
292+
- `qr`: The query/response flag.
293+
- `qtype`: The type of the DNS request.
294+
- `id`: The ID of the DNS request, which is used to match the request and response.
295+
- `rcode`: The response code of the DNS request.
296+
- `latency_ns`: The latency of the DNS request.
297+
298+
The command output looks like the following:
299+
300+
```output
301+
SRC DST NAME QR QTYPE ID RCODE LATENCY_NS
302+
p/default/aks-test:33141 p/kube-system/coredns-57d886c994-r2… db.contoso.com. Q A c215 0ns
303+
p/kube-system/coredns-57d886c994-r2… 168.63.129.16:53 db.contoso.com. Q A 323c 0ns
304+
168.63.129.16:53 p/kube-system/coredns-57d886c994-r2… db.contoso.com. R A 323c NameErr… 13.64ms
305+
p/kube-system/coredns-57d886c994-r2… p/default/aks-test:33141 db.contoso.com. R A c215 NameErr… 0ns
306+
p/default/aks-test:56921 p/kube-system/coredns-57d886c994-r2… db.contoso.com. Q A 6574 0ns
307+
p/kube-system/coredns-57d886c994-r2… p/default/aks-test:56921 db.contoso.com. R A 6574 NameErr… 0ns
308+
```
225309
226-
- [Inspektor Gadget](https://go.microsoft.com/fwlink/?linkid=2260072) - allows checking DNS problems in real time. For more information, see [Troubleshoot DNS failures across an AKS cluster in real time](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md).
310+
You can use the `ID` field to identify whether a query has a response. The `RCODE` field shows you the response code of the DNS request. The `LATENCY_NS` field shows you the latency of the DNS request in nanoseconds. These fields can help you understand the health and performance of DNS responses.
311+
For more information about real-time DNS analysis, see [Troubleshoot DNS failures across an AKS cluster in real time](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md).
227312
228-
In this article, we use Dumpy as an example of how to collect DNS traffic captures from each CoreDNS pod and a client DNS pod (in this case, the `aks-test` pod).
313+
##### Capture DNS traffic
229314
230-
##### Network traffic capture commands
315+
This section demonstrates how to use Dumpy to collect DNS traffic captures from each CoreDNS pod and a client DNS pod (in this case, the `aks-test` pod).
231316
232-
To collect the captures from the test client pod, run the following Dumpy command:
317+
To collect the captures from the test client pod, run the following command:
233318
234319
```bash
235320
kubectl dumpy capture pod aks-test -f "-i any port 53" --name dns-cap1-aks-test
@@ -511,9 +596,9 @@ Observe the results of implementing your action plan. At this point, your action
511596
512597
If these troubleshooting steps don't resolve the problem, repeat the troubleshooting steps as necessary.
513598
514-
[!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
599+
[!INCLUDE [Third-party disclaimer](../../../../includes/third-party-disclaimer.md)]
515600
516-
[!INCLUDE [Third-party contact disclaimer](../../../includes/third-party-contact-disclaimer.md)]
601+
[!INCLUDE [Third-party contact disclaimer](../../../../includes/third-party-contact-disclaimer.md)]
517602
518-
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
603+
[!INCLUDE [Azure Help Support](../../../../includes/azure-help-support.md)]
519604

support/azure/azure-kubernetes/connectivity/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md renamed to support/azure/azure-kubernetes/connectivity/dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,6 @@ We recommend that you don't combine Azure DNS with custom DNS servers in the vir
261261
262262
For more information, see [Name resolution that uses your own DNS server](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances#name-resolution-that-uses-your-own-dns-server).
263263
264-
[!INCLUDE [Third-party contact disclaimer](../../../includes/third-party-contact-disclaimer.md)]
264+
[!INCLUDE [Third-party contact disclaimer](../../../../includes/third-party-contact-disclaimer.md)]
265265
266-
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
266+
[!INCLUDE [Azure Help Support](../../../../includes/azure-help-support.md)]

0 commit comments

Comments
 (0)