You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Troubleshooting control plane connectivity issues - Azure Resource Health
12
12
13
-
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters which are
13
+
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters that are
14
14
reporting issues with control plane connectivity in Azure Resource Health.
15
15
16
16
## Symptoms
17
17
18
18
This alert indicates that there are issues connecting to the storage control plane from the cluster. The two
19
19
categories of alert have different symptoms:
20
20
21
-
- If the cluster is marked as degraded, this means there has been a loss of redundancy to the storage control
22
-
plane. This means that one of the controllers is experiencing connectivity issues. The cluster will continue
23
-
to function, but this issue should be quickly fixed to restore redundancy to the system.
24
-
- If the cluster is marked as unhealthy, this means the storage control plane is completely unreachable from
25
-
the cluster. New workloads which depend on `nexus-volume` volumes will not come up, and existing workloads
26
-
which rely on `nexus-volume` volumes will not be able to be migrated to a new node. Additonally, new cloud
27
-
services networks cannot be created.
21
+
- A degraded cluster has lost redundancy to the storage control plane. This means that one of the controllers
22
+
is experiencing connectivity issues. The cluster continues to function, but this issue should be quickly
23
+
fixed to restore redundancy to the system.
24
+
- An unhealthy cluster is unable to reach the storage control plane. New workloads that depend on `nexus-volume`
25
+
volumes cannot come up, and existing workloads that rely on `nexus-volume` volumes cannot be migrated to a
26
+
new node. Additionally, new cloud services networks cannot be created.
28
27
29
28
## Troubleshooting
30
29
31
30
The cluster may be marked as degraded during a storage appliance upgrade, since these upgrades take controllers
32
31
offline one by one. The cluster should return to healthy status after the upgrade is complete.
33
32
34
33
If an upgrade is not the root cause, you should check if there are any issues with the management switches in
35
-
the aggregator rack. Follow these steps to check for issues:
34
+
the aggregator rack, by following these steps:
36
35
37
36
1. Start on the cluster (Operator Nexus) resource overview page. Click the link to the network fabric resource.
38
37
:::image type="content" source="media/navigate-network-fabric-portal.png" alt-text="Screenshot of a cluster resource, with the network fabric link highlighted." lightbox="media/navigate-network-fabric-portal.png":::
39
-
2. Go to `Infrastructue->Devices`, and search for the aggregator rack management switches. Ensure they are succesfully
38
+
2. Go to `Infrastructue->Devices`, and search for the aggregator rack management switches. Ensure they are successfully
40
39
provisioned and enabled.
41
-
:::image type="content" source="media/navigate-mgmt-switch-portal.png" alt-text="Screenshot of the Infrastructure tab of a network fabric resource." lightbox="media/snavigate-mgmt-switch-portal.png":::
40
+
:::image type="content" source="media/navigate-mgmt-switch-portal.png" alt-text="Screenshot of the Infrastructure tab of a network fabric resource." lightbox="media/navigate-mgmt-switch-portal.png":::
42
41
3. Click on a management switch, and go to the `Monitoring->Metrics` tab. Select `Interface Out Pkts`, then apply splitting
43
42
on the `Interface Name` dimension.
44
-
:::image type="content" source="media/interface-out-pkts.png" alt-text="Screenshot of a metric showing the outward packets of a management switch." lightbox="media/interface-out-pkts.png":::
45
-
4. Check for any interfaces where the packets has suddenly dropped to zero. If you find any, you should reseat any affected
46
-
cables.
43
+
:::image type="content" source="media/interface-out-packets.png" alt-text="Screenshot of a metric showing the outward packets of a management switch." lightbox="media/interface-out-packets.png":::
44
+
4. Check for any interfaces where the number of packets suddenly dropped to zero. If you find any, you should reseat
45
+
any affected cables.
47
46
5. Repeat the check for the second management switch.
48
47
49
48
If upgrade or management switch problems are not the root cause, you should raise a ticket with Microsoft, quoting
0 commit comments