Skip to content

Commit a5e2c22

Browse files
committed
Linting
1 parent bfb1501 commit a5e2c22

6 files changed

+29
-31
lines changed

articles/operator-nexus/troubleshoot-failed-volume-attachments.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,16 @@ ms.service: azure-operator-nexus
1010

1111
# Troubleshooting failed volume attachments - Azure Resource Health
1212

13-
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters which are
13+
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters that are
1414
reporting failed volume attachments in Azure Resource Health.
1515

1616
## Symptoms
1717

18-
This alert indicates that volumes are failing to attach in the undercloud. This can lead to delays in
19-
bringing up workloads in the tenant layer, or migrating existing workloads to a new node. If the cluster
20-
has been marked as degraded, this implies at least 1 volume is failing to attach - in this case the problem
21-
may be limited to this specific volume, and the impact radius is small. If the cluster has been marked as
22-
unhealthy, a high percentage of volumes on at least 1 node are failing to attach, indicating a more serious
23-
incident.
18+
This alert indicates that volumes are failing to attach in the undercloud. Failed volume attachments can lead
19+
to delays in bringing up workloads in the tenant layer, or migrating existing workloads to a new node. A
20+
cluster in degraded state has at least one failed volume attachment - in this case the problem may be limited
21+
to this specific volume, and the impact radius is small. A cluster in unhealthy state has at least one node
22+
where a high percentage of volume attachments are failed, indicating a more serious incident.
2423

2524
## Troubleshooting
2625

articles/operator-nexus/troubleshoot-nfs-unhealthy.md renamed to articles/operator-nexus/troubleshoot-network-file-system-unhealthy.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ ms.service: azure-operator-nexus
1010

1111
# Troubleshooting unhealthy NFS pods - Azure Resource Health
1212

13-
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters which are
14-
reporting unhealthy NFS pods in Azure Resource Health.
13+
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters that are
14+
reporting unhealthy Network File System (NFS) pods in Azure Resource Health.
1515

1616
## Symptoms
1717

articles/operator-nexus/troubleshoot-resource-health-alerts.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,15 @@ These alerts are generated based on the status of the resource and its dependenc
2525
| `1PExtensionsFailedInstall` | [Requires to contact support](#please-contact-support) |
2626
| `ClusterHeartbeatConnectionStatusDisconnectedClusterManagerOperationsAreAffectedPossibleNetworkIssues` | [Troubleshoot Cluster heartbeat connection status shows disconnected] |
2727
| `ClusterHeartbeatConnectionStatusTimedoutPossiblePerformanceIssues` | [Troubleshoot Cluster heartbeat connection status shows disconnected] |
28-
| `AttachmentFailuresDegraded` and `AttachmentFailuresUnhealthy` | [Troubleshoot failed volume attachments] |
29-
| `NFSPodDegraded` and `NFSPodUnhealthy` | [Troubleshoot NFS unhealthy] |
30-
| `CSIControllerUnhealthy`, `CSINodeDegraded` and `CSINodeUnhealthy` | [Troubleshoot unhealthy CSI (storage)] |
31-
| `ControlPlaneStorageConnectivityDegraded` and `ControlPlaneStorageConnectivityUnhealthyVIP` | [Troubleshoot storage control plane disconnected] |
28+
| `AttachmentFailuresDegraded`, and `AttachmentFailuresUnhealthy` | [Troubleshoot failed volume attachments] |
29+
| `NFSPodDegraded`, and `NFSPodUnhealthy` | [Troubleshoot NFS unhealthy] |
30+
| `CSIControllerUnhealthy`, `CSINodeDegraded`, and `CSINodeUnhealthy` | [Troubleshoot unhealthy CSI (storage)] |
31+
| `ControlPlaneStorageConnectivityDegraded`, and `ControlPlaneStorageConnectivityUnhealthyVIP` | [Troubleshoot storage control plane disconnected] |
3232

3333
[Troubleshoot Cluster heartbeat connection status shows disconnected]: ./troubleshoot-cluster-heartbeat-connection-status-disconnected.md
3434
[Troubleshoot failed volume attachments]: ./troubleshoot-failed-volume-attachments.md
35-
[Troubleshoot NFS unhealthy]: ./troubleshoot-nfs-unhealthy.md
36-
[Troubleshoot unhealthy CSI (storage)]: ./troubleshoot-unhealthy-csi.md
35+
[Troubleshoot NFS unhealthy]: ./troubleshoot-network-file-system-unhealthy.md
36+
[Troubleshoot unhealthy CSI (storage)]: ./troubleshoot-unhealthy-container-storage-interface.md
3737
[Troubleshoot storage control plane disconnected]: ./troubleshoot-storage-control-plane-disconnected.md
3838

3939
## Please contact support

articles/operator-nexus/troubleshoot-storage-control-plane-disconnected.md

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Troubleshooting storage control plane connectitivy issues.
2+
title: Troubleshooting storage control plane connectivity issues.
33
description: Troubleshooting Azure Resource Health alerts about control plane connectivity issues.
44
author: jensheasby
55
ms.author: jensheasby
@@ -10,40 +10,39 @@ ms.service: azure-operator-nexus
1010

1111
# Troubleshooting control plane connectivity issues - Azure Resource Health
1212

13-
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters which are
13+
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters that are
1414
reporting issues with control plane connectivity in Azure Resource Health.
1515

1616
## Symptoms
1717

1818
This alert indicates that there are issues connecting to the storage control plane from the cluster. The two
1919
categories of alert have different symptoms:
2020

21-
- If the cluster is marked as degraded, this means there has been a loss of redundancy to the storage control
22-
plane. This means that one of the controllers is experiencing connectivity issues. The cluster will continue
23-
to function, but this issue should be quickly fixed to restore redundancy to the system.
24-
- If the cluster is marked as unhealthy, this means the storage control plane is completely unreachable from
25-
the cluster. New workloads which depend on `nexus-volume` volumes will not come up, and existing workloads
26-
which rely on `nexus-volume` volumes will not be able to be migrated to a new node. Additonally, new cloud
27-
services networks cannot be created.
21+
- A degraded cluster has lost redundancy to the storage control plane. This means that one of the controllers
22+
is experiencing connectivity issues. The cluster continues to function, but this issue should be quickly
23+
fixed to restore redundancy to the system.
24+
- An unhealthy cluster is unable to reach the storage control plane. New workloads that depend on `nexus-volume`
25+
volumes cannot come up, and existing workloads that rely on `nexus-volume` volumes cannot be migrated to a
26+
new node. Additionally, new cloud services networks cannot be created.
2827

2928
## Troubleshooting
3029

3130
The cluster may be marked as degraded during a storage appliance upgrade, since these upgrades take controllers
3231
offline one by one. The cluster should return to healthy status after the upgrade is complete.
3332

3433
If an upgrade is not the root cause, you should check if there are any issues with the management switches in
35-
the aggregator rack. Follow these steps to check for issues:
34+
the aggregator rack, by following these steps:
3635

3736
1. Start on the cluster (Operator Nexus) resource overview page. Click the link to the network fabric resource.
3837
:::image type="content" source="media/navigate-network-fabric-portal.png" alt-text="Screenshot of a cluster resource, with the network fabric link highlighted." lightbox="media/navigate-network-fabric-portal.png":::
39-
2. Go to `Infrastructue->Devices`, and search for the aggregator rack management switches. Ensure they are succesfully
38+
2. Go to `Infrastructue->Devices`, and search for the aggregator rack management switches. Ensure they are successfully
4039
provisioned and enabled.
41-
:::image type="content" source="media/navigate-mgmt-switch-portal.png" alt-text="Screenshot of the Infrastructure tab of a network fabric resource." lightbox="media/snavigate-mgmt-switch-portal.png":::
40+
:::image type="content" source="media/navigate-mgmt-switch-portal.png" alt-text="Screenshot of the Infrastructure tab of a network fabric resource." lightbox="media/navigate-mgmt-switch-portal.png":::
4241
3. Click on a management switch, and go to the `Monitoring->Metrics` tab. Select `Interface Out Pkts`, then apply splitting
4342
on the `Interface Name` dimension.
44-
:::image type="content" source="media/interface-out-pkts.png" alt-text="Screenshot of a metric showing the outward packets of a management switch." lightbox="media/interface-out-pkts.png":::
45-
4. Check for any interfaces where the packets has suddenly dropped to zero. If you find any, you should reseat any affected
46-
cables.
43+
:::image type="content" source="media/interface-out-packets.png" alt-text="Screenshot of a metric showing the outward packets of a management switch." lightbox="media/interface-out-packets.png":::
44+
4. Check for any interfaces where the number of packets suddenly dropped to zero. If you find any, you should reseat
45+
any affected cables.
4746
5. Repeat the check for the second management switch.
4847

4948
If upgrade or management switch problems are not the root cause, you should raise a ticket with Microsoft, quoting

articles/operator-nexus/troubleshoot-unhealthy-csi.md renamed to articles/operator-nexus/troubleshoot-unhealthy-container-storage-interface.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: azure-operator-nexus
1010

1111
# Troubleshooting unhealthy CSI pods (storage) - Azure Resource Health
1212

13-
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters which are
13+
This article provides troubleshooting advice and escalation methods for Operator Nexus clusters that are
1414
reporting unhealthy Container Storage Interface (CSI) pods in Azure Resource Health.
1515

1616
## Symptoms

0 commit comments

Comments
 (0)