Skip to content

Commit 7015ff0

Browse files
Merge pull request #9768 from MicrosoftDocs/main
Auto Publish – main to live - 2025-09-23 22:00 UTC
2 parents 9cae5cf + 129c730 commit 7015ff0

File tree

2 files changed

+328
-1
lines changed

2 files changed

+328
-1
lines changed
Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
---
2+
title: Troubleshooting LocalDNS on AKS
3+
description: Learn how to create a troubleshooting workflow to fix issues seen with LocalDNS in Azure Kubernetes Service (AKS).
4+
author: vaibhavarora
5+
ms.author: vaibhavarora
6+
ms.date: 09/17/2025
7+
ms.reviewer: v-rekhanain, v-leedennis, josebl, v-weizhu, qasimsarfraz
8+
editor: vaibhavarora
9+
ms.service: azure-kubernetes-service
10+
ms.custom: sap:Connectivity
11+
ms.topic: troubleshooting-general
12+
#Customer intent: As an Azure Kubernetes user, I want to learn how to create a troubleshooting workflow so that I can fix LocalDNS problems in Azure Kubernetes Service (AKS).
13+
---
14+
# Troubleshoot issues with LocalDNS on Azure Kubernetes Service (AKS)
15+
16+
This article discusses how to create a troubleshooting workflow to fix Domain Name System (DNS) resolution problems in Azure Kubernetes Service (AKS), when using LocalDNS. To learn more about LocalDNS, you can read our overview in [DNS Resolution in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/dns-concepts#localdns-in-azure-kubernetes-service-preview).
17+
18+
## Prerequisites
19+
20+
- The Kubernetes [kubectl](https://kubernetes.io/docs/reference/kubectl/overview/) command-line tool
21+
22+
**Note:** To install kubectl by using [Azure CLI](/cli/azure/install-azure-cli), run the [az aks install-cli](/cli/azure/aks#az-aks-install-cli) command.
23+
24+
- The [systemctl](https://man7.org/linux/man-pages/man1/systemctl.1.html) command-line tool.
25+
26+
- The [journalctl](https://www.man7.org/linux/man-pages/man1/journalctl.1.html) command-line tool.
27+
28+
## Identifying patterns in DNS failures
29+
Before you begin diagnosing the issues seen with LocalDNS, identify potential patterns with your DNS failures. Some patterns include:
30+
1. DNS resolution failure - is this happening all the time or intermittently?
31+
2. Are you seeing the DNS issues from all the nodes, a specific node pool, a subset of nodes or just a single node?
32+
3. Are you seeing DNS issues from nodes in a specific Azure zone? Or from all the zones?
33+
4. What protocols are failing? Is it both TCP (Transmission Control Protocol) and UDP (User Datagram Protocol), or just one of them?
34+
5. What zones are failing? Is it all zones? or a specific zone traffic?
35+
36+
**Note:** In this case, "zone" refers to DNS zones like *cluster.local* and root (.) and not to physical zones in Azure.
37+
38+
## Diagnose LocalDNS with a test DNSUtil pod
39+
40+
### Step 1: Deploy a test dnsutils pod
41+
Option 1 - Deploy a test pod to your cluster using the following command:
42+
``` bash
43+
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
44+
```
45+
46+
Option 2 - If you're seeing DNS issues in specific nodes, you can control the deployment of the test pod using nodeSelector:
47+
48+
```bash
49+
cat <<EOF | kubectl create -f -
50+
apiVersion: v1
51+
kind: Pod
52+
metadata:
53+
name: dnsutils2
54+
namespace: default
55+
spec:
56+
nodeSelector:
57+
kubernetes.io/hostname: <NODE>
58+
containers:
59+
- name: dnsutils
60+
image: registry.k8s.io/e2e-test-images/agnhost:2.39
61+
command:
62+
- sleep
63+
- "infinity"
64+
imagePullPolicy: IfNotPresent
65+
restartPolicy: Always
66+
EOF
67+
```
68+
69+
Option 3 - If you run both linux and windows nodes in your cluster, you can configure the test pod to deploy to all linux nodes
70+
71+
```bash
72+
cat <<EOF | kubectl create -f -
73+
apiVersion: apps/v1
74+
kind: DaemonSet
75+
metadata:
76+
name: dnsutils
77+
namespace: default
78+
spec:
79+
selector:
80+
matchLabels:
81+
app: dnsutils
82+
template:
83+
metadata:
84+
labels:
85+
app: dnsutils
86+
spec:
87+
nodeSelector:
88+
kubernetes.io/os: linux
89+
containers:
90+
- name: dnsutils
91+
image: registry.k8s.io/e2e-test-images/agnhost:2.39
92+
command:
93+
- sleep
94+
- "infinity"
95+
imagePullPolicy: IfNotPresent
96+
EOF
97+
```
98+
99+
### Enable Query logging for LocalDNS
100+
101+
Most use cases require query logging to be turned off in production because of its high memory usage and performance implications. However, for troubleshooting purposes, you should enable query logging in your localDNS configuration to root cause the source of your errors. Once the analysis is complete, you can turn it off.
102+
103+
Option 1 - Enable Query logging on all nodes
104+
105+
You can modify your LocalDNS configuration to reflect *queryLogging: Log* for a single or multiple DNS zones.
106+
107+
```json
108+
{
109+
"mode": "Required",
110+
"vnetDNSOverrides": {
111+
".": {
112+
"queryLogging": "Log",
113+
"protocol": "PreferUDP",
114+
"forwardDestination": "VnetDNS",
115+
"forwardPolicy": "Sequential",
116+
"maxConcurrent": 1000,
117+
"cacheDurationInSeconds": 3600,
118+
"serveStaleDurationInSeconds": 3600,
119+
"serveStale": "Immediate"
120+
},
121+
"cluster.local": {
122+
"queryLogging": "Log",
123+
"protocol": "ForceTCP",
124+
"forwardDestination": "ClusterCoreDNS",
125+
"forwardPolicy": "Sequential",
126+
"maxConcurrent": 1000,
127+
"cacheDurationInSeconds": 3600,
128+
"serveStaleDurationInSeconds": 3600,
129+
"serveStale": "Immediate"
130+
}
131+
},
132+
"kubeDNSOverrides": {
133+
".": {
134+
"queryLogging": "Log",
135+
"protocol": "PreferUDP",
136+
"forwardDestination": "ClusterCoreDNS",
137+
"forwardPolicy": "Sequential",
138+
"maxConcurrent": 1000,
139+
"cacheDurationInSeconds": 3600,
140+
"serveStaleDurationInSeconds": 3600,
141+
"serveStale": "Immediate"
142+
},
143+
"cluster.local": {
144+
"queryLogging": "Log",
145+
"protocol": "ForceTCP",
146+
"forwardDestination": "ClusterCoreDNS",
147+
"forwardPolicy": "Sequential",
148+
"maxConcurrent": 1000,
149+
"cacheDurationInSeconds": 3600,
150+
"serveStaleDurationInSeconds": 3600,
151+
"serveStale": "Immediate"
152+
}
153+
}
154+
}
155+
```
156+
157+
You can enable this change on the node pool using the Azure CLI
158+
159+
```bash
160+
az aks nodepool update --name mynodepool1 --cluster-name myAKSCluster --resource-group myResourceGroup --localdns-config ./localdnsconfig.json
161+
```
162+
163+
**Note:** Making changes to the LocalDNS configuration triggers a reimage operation in the chosen node pool.
164+
165+
Option 2 - Enable Query logging on a specific node
166+
167+
You can diagnose LocalDNS issues on a specific node by temporarily rewriting the LocalDNS configuration. You can [connect to the node](https://learn.microsoft.com/azure/aks/node-access#connect-using-kubectl-debug) manually and update the core file used by LocalDNS, only restarting the specific LocalDNS service.
168+
169+
**Note:** The changes made this way are ephemeral in nature and don't persist once the troubleshooting is complete.
170+
171+
```bash
172+
# You need to connect to the node before running the following commands
173+
174+
## open the configuration file for LocalDNS
175+
vi /opt/azure/containers/localdns/localdns.corefile
176+
177+
<Manually change errors to log for a zone or all zones>
178+
179+
# ***********************************************************************************
180+
# WARNING: Changes to this file will be overwritten and not persisted.
181+
# ***********************************************************************************
182+
# whoami (used for health check of DNS)
183+
health-check.localdns.local:53 {
184+
bind 169.254.10.10 169.254.10.11
185+
whoami
186+
}
187+
# VnetDNS overrides apply to DNS traffic from pods with dnsPolicy:default or kubelet (referred to as VnetDNS traffic).
188+
.:53 {
189+
errors
190+
bind 169.254.10.10
191+
forward . 168.63.129.16 {
192+
policy sequential
193+
max_concurrent 1000
194+
}
195+
ready 169.254.10.10:8181
196+
cache 3600s {
197+
success 9984
198+
denial 9984
199+
serve_stale 3600s verify
200+
servfail 0
201+
}
202+
loop
203+
nsid localdns
204+
prometheus :9253
205+
template ANY ANY internal.cloudapp.net {
206+
match "^(?:[^.]+\.){4,}internal\.cloudapp\.net\.$"
207+
rcode NXDOMAIN
208+
fallthrough
209+
}
210+
template ANY ANY reddog.microsoft.com {
211+
rcode NXDOMAIN
212+
}
213+
}
214+
cluster.local:53 {
215+
errors
216+
bind 169.254.10.10
217+
forward . 10.0.0.10 {
218+
force_tcp
219+
policy sequential
220+
max_concurrent 1000
221+
}
222+
ready 169.254.10.10:8181
223+
cache 3600s {
224+
success 9984
225+
denial 9984
226+
serve_stale 3600s verify
227+
servfail 0
228+
}
229+
loop
230+
nsid localdns
231+
prometheus :9253
232+
}
233+
...
234+
235+
...
236+
<Save the changes>
237+
238+
<Restart localDNS service>
239+
systemctl restart localdns
240+
```
241+
242+
Once restarted, LocalDNS should begin collecting all logs for the chosen zones.
243+
244+
### Generate traffic from dnsutils pod
245+
246+
The next step would be to trigger some DNS traffic on LocalDNS. LocalDNS has two IPs - The KubeDNS traffic goes to the ClusterListenerIP - 169.254.10.11, while VnetDNSTraffic goes to the NodeListenerIP - 169.254.10.10
247+
248+
#### Test KubeDNS zone traffic
249+
250+
```bash
251+
kubectl exec dnsutils -- dig bing.com +ignore +noedns +search +noshowsearch +time=10 +tries=6
252+
253+
; <<>> DiG 9.16.27 <<>> bing.com +ignore +noedns +search +noshowsearch +time=10 +tries=6
254+
;; global options: +cmd
255+
;; Got answer:
256+
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7452
257+
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
258+
259+
;; QUESTION SECTION:
260+
;bing.com. IN A
261+
262+
;; ANSWER SECTION:
263+
bing.com. 30 IN A 150.171.27.10
264+
bing.com. 30 IN A 150.171.28.10
265+
266+
;; Query time: 3 msec
267+
;; SERVER: 169.254.10.11#53(169.254.10.11)
268+
;; WHEN: Thu Jul 03 16:57:42 UTC 2025
269+
;; MSG SIZE rcvd: 74
270+
```
271+
272+
#### Test VnetDNS zone traffic
273+
274+
```bash
275+
kubectl exec dnsutils -- dig bing.com +ignore +noedns +search +noshowsearch +time=10 +tries=6 @169.254.10.10
276+
277+
; <<>> DiG 9.16.27 <<>> bing.com +ignore +noedns +search +noshowsearch +time=10 +tries=6 @169.254.10.10
278+
;; global options: +cmd
279+
;; Got answer:
280+
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3580
281+
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
282+
283+
;; QUESTION SECTION:
284+
;bing.com. IN A
285+
286+
;; ANSWER SECTION:
287+
bing.com. 1315 IN A 150.171.28.10
288+
bing.com. 1315 IN A 150.171.27.10
289+
290+
;; Query time: 7 msec
291+
;; SERVER: 169.254.10.10#53(169.254.10.10)
292+
;; WHEN: Thu Jul 03 16:59:07 UTC 2025
293+
;; MSG SIZE rcvd: 74
294+
```
295+
296+
### View LocalDNS logs collected
297+
298+
Lastly, you can now view the logs from your LocalDNS instances. To view the logs, you can connect to the node and run the following commands.
299+
300+
```bash
301+
# view the logs for the aks-local-dns service
302+
journalctl -u localdns
303+
304+
# To view logs in reverse chronological order (latest logs first)
305+
journalctl -u localdns --reverse
306+
307+
# To continuously follow the logs.
308+
journalctl -u localdns -f
309+
310+
# sample output using journalctl for the bing.com responses
311+
journalctl -u localdns | grep bing.com
312+
Jul 03 16:57:42 aks-userpool-24995383-vmss000000 localdns-coredns[2491520]: [INFO] 10.244.0.95:41796 - 7452 "A IN bing.com. udp 26 false 512" NOERROR qr,rd,ra 74 0.004490668s
313+
Jul 03 16:59:07 aks-userpool-24995383-vmss000000 localdns-coredns[2491520]: [INFO] 10.244.0.95:58454 - 3580 "A IN bing.com. udp 26 false 512" NOERROR qr,rd,ra 74 0.001570158s
314+
```
315+
316+
If you see logs for your traffic, the pod is able to reach the LocalDNS service.
317+
318+
## Next steps
319+
If the above logs fail to help root cause the issue, you can enable [Query logging for CoreDNS](https://learn.microsoft.com/azure/aks/coredns-custom#enable-dns-query-logging) to validate if CoreDNS is working as intended.
320+
321+
[!INCLUDE [Azure Help Support](../../../../includes/azure-help-support.md)]
322+
323+
[!INCLUDE [Third-party disclaimer](../../../../includes/third-party-disclaimer.md)]
324+
325+
[!INCLUDE [Third-party contact disclaimer](../../../../includes/third-party-contact-disclaimer.md)]
326+

support/azure/azure-kubernetes/toc.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,8 @@ items:
261261
href: connectivity/dns/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md
262262
- name: Troubleshoot DNS from inside a pod
263263
href: connectivity/dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md
264-
264+
- name: Troubleshoot LocalDNS
265+
href: connectivity/dns/troubleshoot-localdns.md
265266
- name: Data collection guide
266267
items:
267268
- name: Capture real-time system insights from cluster

0 commit comments

Comments
 (0)