Skip to content

Commit 53ad682

Browse files
authored
Merge pull request #291727 from papadeltasierra/pauldsmith/537853847
Troubleshooting issues with DNS in Nexus Network Fabric
2 parents 72e865b + a88cf08 commit 53ad682

File tree

2 files changed

+58
-0
lines changed

2 files changed

+58
-0
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,8 @@
330330
href: troubleshoot-memory-limits.md
331331
- name: Troubleshoot LACP Bonding
332332
href: troubleshoot-lacp-bonding.md
333+
- name: Troubleshoot DNS Issues
334+
href: troubleshoot-dns-issues.md
333335
- name: Troubleshoot NAKS Cluster Node Packet Loss
334336
href: troubleshoot-packet-loss.md
335337
- name: Troubleshoot TWAMP (UDP) not working
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: "Azure Operator Nexus: DNS Issues"
3+
description: Learn how to troubleshoot cluster DNS issues.
4+
author: papadeltasierra
5+
ms.author: pauldsmith
6+
ms.service: azure-operator-nexus
7+
ms.custom: azure-operator-nexus
8+
ms.topic: troubleshooting
9+
ms.date: 12/10/2024
10+
# ms.custom: template-include
11+
---
12+
13+
# Troubleshoot Nexus DNS Issues
14+
15+
NNF (Nexus Network Fabric) provides a bridge between Nexus resources hosted by a Kubernetes
16+
cluster running on Azure VMs (Virtual Machines) and Azure, accessing Azure resources via their
17+
domain names. However a DNS (Domain Name System) error in NNF can mean that Azure resources
18+
can't be contacted which impacts deployment or management of Nexus resources.
19+
20+
The DNS proxy that causes this error is an [Envoy DNS Proxy](https://www.envoyproxy.io/docs/envoy/latest/)
21+
running via a Kubernetes deployment in either an infrastructure or tenant Kubernetes cluster.
22+
The precise location of the DNS proxy is determined when the customer
23+
deploys their NAKS (Nexus Azure Kubernetes Service) cluster or during some other
24+
deployment.
25+
26+
## Diagnosis
27+
28+
* Deployment or management of remote Nexus resources fails with "DeploymentFailed."
29+
* Azure portal shows no errors being generated for the Azure resources that are unreachable; there are no errors because the failing operations aren't reaching the Azure resources at all.
30+
31+
## Mitigation steps
32+
33+
### Trigger a DNS cache refresh for the NNF Workload Proxy
34+
35+
- Identify the Infrastructure or Tenant Kubernetes Cluster on which the DNS proxy is running from the initial configuration and deployment process
36+
- Log in to the Kubernetes cluster
37+
- Using the Azure portal, find your cluster
38+
- From the _Overview_ blade, click the _Connect_ command (between _Refresh_ and _Delete_)
39+
- Follow the instructions from the resulting pop-up window that explain how to connect to the Kubernetes cluster
40+
- Identify the DNS proxy deployment using this command
41+
```bash
42+
$ kubectl get deployments --all-namespaces=true | grep envoy
43+
```
44+
- Restart the deployment, which causes the DNS caching to be reset, using this command:
45+
```bash
46+
kubectl rollout restart deployment <your-envoy-deployment-name> --namespace <namespace-where-envoy-pod-exists>
47+
```
48+
49+
## Verification
50+
51+
After the DNS cache is refreshed, create or manage operations are successful.
52+
53+
## Related content
54+
55+
- If you still have questions, contact [Azure support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
56+
- For more information about support plans, see [Azure support plans](https://azure.microsoft.com/support/plans/response/).

0 commit comments

Comments
 (0)