|
| 1 | +--- |
| 2 | +title: Troubleshooting LocalDNS on AKS |
| 3 | +description: Learn how to create a troubleshooting workflow to fix issues seen with LocalDNS in Azure Kubernetes Service (AKS). |
| 4 | +author: vaibhavarora |
| 5 | +ms.author: vaibhavarora |
| 6 | +ms.date: 09/17/2025 |
| 7 | +ms.reviewer: v-rekhanain, v-leedennis, josebl, v-weizhu, qasimsarfraz |
| 8 | +editor: vaibhavarora |
| 9 | +ms.service: azure-kubernetes-service |
| 10 | +ms.custom: sap:Connectivity |
| 11 | +ms.topic: troubleshooting-general |
| 12 | +#Customer intent: As an Azure Kubernetes user, I want to learn how to create a troubleshooting workflow so that I can fix LocalDNS problems in Azure Kubernetes Service (AKS). |
| 13 | +--- |
| 14 | +# Troubleshoot issues with LocalDNS on Azure Kubernetes Service (AKS) |
| 15 | + |
| 16 | +This article discusses how to create a troubleshooting workflow to fix Domain Name System (DNS) resolution problems in Azure Kubernetes Service (AKS), when using LocalDNS. To learn more about LocalDNS, you can read our overview in [DNS Resolution in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/dns-concepts#localdns-in-azure-kubernetes-service-preview). |
| 17 | + |
| 18 | +## Prerequisites |
| 19 | + |
| 20 | +- The Kubernetes [kubectl](https://kubernetes.io/docs/reference/kubectl/overview/) command-line tool |
| 21 | + |
| 22 | + **Note:** To install kubectl by using [Azure CLI](/cli/azure/install-azure-cli), run the [az aks install-cli](/cli/azure/aks#az-aks-install-cli) command. |
| 23 | + |
| 24 | +- The [systemctl](https://man7.org/linux/man-pages/man1/systemctl.1.html) command-line tool. |
| 25 | + |
| 26 | +- The [journalctl](https://www.man7.org/linux/man-pages/man1/journalctl.1.html) command-line tool. |
| 27 | + |
| 28 | +## Identifying patterns in DNS failures |
| 29 | +Before you begin diagnosing the issues seen with LocalDNS, identify potential patterns with your DNS failures. Some patterns include: |
| 30 | +1. DNS resolution failure - is this happening all the time or intermittently? |
| 31 | +2. Are you seeing the DNS issues from all the nodes, a specific node pool, a subset of nodes or just a single node? |
| 32 | +3. Are you seeing DNS issues from nodes in a specific Azure zone? Or from all the zones? |
| 33 | +4. What protocols are failing? Is it both TCP (Transmission Control Protocol) and UDP (User Datagram Protocol), or just one of them? |
| 34 | +5. What zones are failing? Is it all zones? or a specific zone traffic? |
| 35 | + |
| 36 | + **Note:** In this case, "zone" refers to DNS zones like *cluster.local* and root (.) and not to physical zones in Azure. |
| 37 | + |
| 38 | +## Diagnose LocalDNS with a test DNSUtil pod |
| 39 | + |
| 40 | +### Step 1: Deploy a test dnsutils pod |
| 41 | +Option 1 - Deploy a test pod to your cluster using the following command: |
| 42 | + ``` bash |
| 43 | + kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml |
| 44 | + ``` |
| 45 | + |
| 46 | +Option 2 - If you're seeing DNS issues in specific nodes, you can control the deployment of the test pod using nodeSelector: |
| 47 | + |
| 48 | + ```bash |
| 49 | + cat <<EOF | kubectl create -f - |
| 50 | + apiVersion: v1 |
| 51 | + kind: Pod |
| 52 | + metadata: |
| 53 | + name: dnsutils2 |
| 54 | + namespace: default |
| 55 | + spec: |
| 56 | + nodeSelector: |
| 57 | + kubernetes.io/hostname: <NODE> |
| 58 | + containers: |
| 59 | + - name: dnsutils |
| 60 | + image: registry.k8s.io/e2e-test-images/agnhost:2.39 |
| 61 | + command: |
| 62 | + - sleep |
| 63 | + - "infinity" |
| 64 | + imagePullPolicy: IfNotPresent |
| 65 | + restartPolicy: Always |
| 66 | + EOF |
| 67 | + ``` |
| 68 | +
|
| 69 | +Option 3 - If you run both linux and windows nodes in your cluster, you can configure the test pod to deploy to all linux nodes |
| 70 | +
|
| 71 | + ```bash |
| 72 | + cat <<EOF | kubectl create -f - |
| 73 | + apiVersion: apps/v1 |
| 74 | + kind: DaemonSet |
| 75 | + metadata: |
| 76 | + name: dnsutils |
| 77 | + namespace: default |
| 78 | + spec: |
| 79 | + selector: |
| 80 | + matchLabels: |
| 81 | + app: dnsutils |
| 82 | + template: |
| 83 | + metadata: |
| 84 | + labels: |
| 85 | + app: dnsutils |
| 86 | + spec: |
| 87 | + nodeSelector: |
| 88 | + kubernetes.io/os: linux |
| 89 | + containers: |
| 90 | + - name: dnsutils |
| 91 | + image: registry.k8s.io/e2e-test-images/agnhost:2.39 |
| 92 | + command: |
| 93 | + - sleep |
| 94 | + - "infinity" |
| 95 | + imagePullPolicy: IfNotPresent |
| 96 | + EOF |
| 97 | + ``` |
| 98 | +
|
| 99 | +### Enable Query logging for LocalDNS |
| 100 | +
|
| 101 | +Most use cases require query logging to be turned off in production because of its high memory usage and performance implications. However, for troubleshooting purposes, you should enable query logging in your localDNS configuration to root cause the source of your errors. Once the analysis is complete, you can turn it off. |
| 102 | +
|
| 103 | +Option 1 - Enable Query logging on all nodes |
| 104 | +
|
| 105 | +You can modify your LocalDNS configuration to reflect *queryLogging: Log* for a single or multiple DNS zones. |
| 106 | +
|
| 107 | +```json |
| 108 | +{ |
| 109 | + "mode": "Required", |
| 110 | + "vnetDNSOverrides": { |
| 111 | + ".": { |
| 112 | + "queryLogging": "Log", |
| 113 | + "protocol": "PreferUDP", |
| 114 | + "forwardDestination": "VnetDNS", |
| 115 | + "forwardPolicy": "Sequential", |
| 116 | + "maxConcurrent": 1000, |
| 117 | + "cacheDurationInSeconds": 3600, |
| 118 | + "serveStaleDurationInSeconds": 3600, |
| 119 | + "serveStale": "Immediate" |
| 120 | + }, |
| 121 | + "cluster.local": { |
| 122 | + "queryLogging": "Log", |
| 123 | + "protocol": "ForceTCP", |
| 124 | + "forwardDestination": "ClusterCoreDNS", |
| 125 | + "forwardPolicy": "Sequential", |
| 126 | + "maxConcurrent": 1000, |
| 127 | + "cacheDurationInSeconds": 3600, |
| 128 | + "serveStaleDurationInSeconds": 3600, |
| 129 | + "serveStale": "Immediate" |
| 130 | + } |
| 131 | + }, |
| 132 | + "kubeDNSOverrides": { |
| 133 | + ".": { |
| 134 | + "queryLogging": "Log", |
| 135 | + "protocol": "PreferUDP", |
| 136 | + "forwardDestination": "ClusterCoreDNS", |
| 137 | + "forwardPolicy": "Sequential", |
| 138 | + "maxConcurrent": 1000, |
| 139 | + "cacheDurationInSeconds": 3600, |
| 140 | + "serveStaleDurationInSeconds": 3600, |
| 141 | + "serveStale": "Immediate" |
| 142 | + }, |
| 143 | + "cluster.local": { |
| 144 | + "queryLogging": "Log", |
| 145 | + "protocol": "ForceTCP", |
| 146 | + "forwardDestination": "ClusterCoreDNS", |
| 147 | + "forwardPolicy": "Sequential", |
| 148 | + "maxConcurrent": 1000, |
| 149 | + "cacheDurationInSeconds": 3600, |
| 150 | + "serveStaleDurationInSeconds": 3600, |
| 151 | + "serveStale": "Immediate" |
| 152 | + } |
| 153 | + } |
| 154 | +} |
| 155 | +``` |
| 156 | +
|
| 157 | +You can enable this change on the node pool using the Azure CLI |
| 158 | +
|
| 159 | +```bash |
| 160 | +az aks nodepool update --name mynodepool1 --cluster-name myAKSCluster --resource-group myResourceGroup --localdns-config ./localdnsconfig.json |
| 161 | +``` |
| 162 | +
|
| 163 | +**Note:** Making changes to the LocalDNS configuration triggers a reimage operation in the chosen node pool. |
| 164 | +
|
| 165 | +Option 2 - Enable Query logging on a specific node |
| 166 | +
|
| 167 | +You can diagnose LocalDNS issues on a specific node by temporarily rewriting the LocalDNS configuration. You can [connect to the node](https://learn.microsoft.com/azure/aks/node-access#connect-using-kubectl-debug) manually and update the core file used by LocalDNS, only restarting the specific LocalDNS service. |
| 168 | +
|
| 169 | +**Note:** The changes made this way are ephemeral in nature and don't persist once the troubleshooting is complete. |
| 170 | +
|
| 171 | +```bash |
| 172 | +# You need to connect to the node before running the following commands |
| 173 | +
|
| 174 | +## open the configuration file for LocalDNS |
| 175 | +vi /opt/azure/containers/localdns/localdns.corefile |
| 176 | +
|
| 177 | +<Manually change errors to log for a zone or all zones> |
| 178 | +
|
| 179 | +# *********************************************************************************** |
| 180 | +# WARNING: Changes to this file will be overwritten and not persisted. |
| 181 | +# *********************************************************************************** |
| 182 | +# whoami (used for health check of DNS) |
| 183 | +health-check.localdns.local:53 { |
| 184 | + bind 169.254.10.10 169.254.10.11 |
| 185 | + whoami |
| 186 | +} |
| 187 | +# VnetDNS overrides apply to DNS traffic from pods with dnsPolicy:default or kubelet (referred to as VnetDNS traffic). |
| 188 | +.:53 { |
| 189 | + errors |
| 190 | + bind 169.254.10.10 |
| 191 | + forward . 168.63.129.16 { |
| 192 | + policy sequential |
| 193 | + max_concurrent 1000 |
| 194 | + } |
| 195 | + ready 169.254.10.10:8181 |
| 196 | + cache 3600s { |
| 197 | + success 9984 |
| 198 | + denial 9984 |
| 199 | + serve_stale 3600s verify |
| 200 | + servfail 0 |
| 201 | + } |
| 202 | + loop |
| 203 | + nsid localdns |
| 204 | + prometheus :9253 |
| 205 | + template ANY ANY internal.cloudapp.net { |
| 206 | + match "^(?:[^.]+\.){4,}internal\.cloudapp\.net\.$" |
| 207 | + rcode NXDOMAIN |
| 208 | + fallthrough |
| 209 | + } |
| 210 | + template ANY ANY reddog.microsoft.com { |
| 211 | + rcode NXDOMAIN |
| 212 | + } |
| 213 | +} |
| 214 | +cluster.local:53 { |
| 215 | + errors |
| 216 | + bind 169.254.10.10 |
| 217 | + forward . 10.0.0.10 { |
| 218 | + force_tcp |
| 219 | + policy sequential |
| 220 | + max_concurrent 1000 |
| 221 | + } |
| 222 | + ready 169.254.10.10:8181 |
| 223 | + cache 3600s { |
| 224 | + success 9984 |
| 225 | + denial 9984 |
| 226 | + serve_stale 3600s verify |
| 227 | + servfail 0 |
| 228 | + } |
| 229 | + loop |
| 230 | + nsid localdns |
| 231 | + prometheus :9253 |
| 232 | +} |
| 233 | +... |
| 234 | +
|
| 235 | +... |
| 236 | +<Save the changes> |
| 237 | +
|
| 238 | +<Restart localDNS service> |
| 239 | +systemctl restart localdns |
| 240 | +``` |
| 241 | +
|
| 242 | +Once restarted, LocalDNS should begin collecting all logs for the chosen zones. |
| 243 | +
|
| 244 | +### Generate traffic from dnsutils pod |
| 245 | +
|
| 246 | +The next step would be to trigger some DNS traffic on LocalDNS. LocalDNS has two IPs - The KubeDNS traffic goes to the ClusterListenerIP - 169.254.10.11, while VnetDNSTraffic goes to the NodeListenerIP - 169.254.10.10 |
| 247 | +
|
| 248 | +#### Test KubeDNS zone traffic |
| 249 | +
|
| 250 | +```bash |
| 251 | +kubectl exec dnsutils -- dig bing.com +ignore +noedns +search +noshowsearch +time=10 +tries=6 |
| 252 | +
|
| 253 | +; <<>> DiG 9.16.27 <<>> bing.com +ignore +noedns +search +noshowsearch +time=10 +tries=6 |
| 254 | +;; global options: +cmd |
| 255 | +;; Got answer: |
| 256 | +;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7452 |
| 257 | +;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 |
| 258 | +
|
| 259 | +;; QUESTION SECTION: |
| 260 | +;bing.com. IN A |
| 261 | +
|
| 262 | +;; ANSWER SECTION: |
| 263 | +bing.com. 30 IN A 150.171.27.10 |
| 264 | +bing.com. 30 IN A 150.171.28.10 |
| 265 | +
|
| 266 | +;; Query time: 3 msec |
| 267 | +;; SERVER: 169.254.10.11#53(169.254.10.11) |
| 268 | +;; WHEN: Thu Jul 03 16:57:42 UTC 2025 |
| 269 | +;; MSG SIZE rcvd: 74 |
| 270 | +``` |
| 271 | +
|
| 272 | +#### Test VnetDNS zone traffic |
| 273 | +
|
| 274 | +```bash |
| 275 | +kubectl exec dnsutils -- dig bing.com +ignore +noedns +search +noshowsearch +time=10 +tries=6 @169.254.10.10 |
| 276 | +
|
| 277 | +; <<>> DiG 9.16.27 <<>> bing.com +ignore +noedns +search +noshowsearch +time=10 +tries=6 @169.254.10.10 |
| 278 | +;; global options: +cmd |
| 279 | +;; Got answer: |
| 280 | +;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3580 |
| 281 | +;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 |
| 282 | +
|
| 283 | +;; QUESTION SECTION: |
| 284 | +;bing.com. IN A |
| 285 | +
|
| 286 | +;; ANSWER SECTION: |
| 287 | +bing.com. 1315 IN A 150.171.28.10 |
| 288 | +bing.com. 1315 IN A 150.171.27.10 |
| 289 | +
|
| 290 | +;; Query time: 7 msec |
| 291 | +;; SERVER: 169.254.10.10#53(169.254.10.10) |
| 292 | +;; WHEN: Thu Jul 03 16:59:07 UTC 2025 |
| 293 | +;; MSG SIZE rcvd: 74 |
| 294 | +``` |
| 295 | +
|
| 296 | +### View LocalDNS logs collected |
| 297 | +
|
| 298 | +Lastly, you can now view the logs from your LocalDNS instances. To view the logs, you can connect to the node and run the following commands. |
| 299 | +
|
| 300 | +```bash |
| 301 | +# view the logs for the aks-local-dns service |
| 302 | +journalctl -u localdns |
| 303 | +
|
| 304 | +# To view logs in reverse chronological order (latest logs first) |
| 305 | +journalctl -u localdns --reverse |
| 306 | +
|
| 307 | +# To continuously follow the logs. |
| 308 | +journalctl -u localdns -f |
| 309 | +
|
| 310 | +# sample output using journalctl for the bing.com responses |
| 311 | +journalctl -u localdns | grep bing.com |
| 312 | +Jul 03 16:57:42 aks-userpool-24995383-vmss000000 localdns-coredns[2491520]: [INFO] 10.244.0.95:41796 - 7452 "A IN bing.com. udp 26 false 512" NOERROR qr,rd,ra 74 0.004490668s |
| 313 | +Jul 03 16:59:07 aks-userpool-24995383-vmss000000 localdns-coredns[2491520]: [INFO] 10.244.0.95:58454 - 3580 "A IN bing.com. udp 26 false 512" NOERROR qr,rd,ra 74 0.001570158s |
| 314 | +``` |
| 315 | +
|
| 316 | +If you see logs for your traffic, the pod is able to reach the LocalDNS service. |
| 317 | +
|
| 318 | +## Next steps |
| 319 | +If the above logs fail to help root cause the issue, you can enable [Query logging for CoreDNS](https://learn.microsoft.com/azure/aks/coredns-custom#enable-dns-query-logging) to validate if CoreDNS is working as intended. |
| 320 | +
|
| 321 | +[!INCLUDE [Azure Help Support](../../../../includes/azure-help-support.md)] |
| 322 | +
|
| 323 | +[!INCLUDE [Third-party disclaimer](../../../../includes/third-party-disclaimer.md)] |
| 324 | +
|
| 325 | +[!INCLUDE [Third-party contact disclaimer](../../../../includes/third-party-contact-disclaimer.md)] |
| 326 | +
|
0 commit comments